내가 좋아할 만한 콘텐츠
🔄RLHF → RLVR → Rubrics → OnlineRubrics 👤 Human feedback = noisy & coarse 🧮 Verifiable rewards = too narrow 📋 Static rubrics = rigid, easy to hack, miss emergent behaviors 💡We introduce OnlineRubrics: elicited rubrics that evolve as models train. arxiv.org/abs/2510.07284

Sat down with @lennysan to talk about where AI is headed and how we’re making it work for model builders, enterprises and governments. Also went down memory lane about my time at Uber Eats. 🙂
“I think one of the misunderstandings is that AI is this magic wand or it can solve all problems, and that’s not true today. But there is a ton of value when you get it right.” Our CEO @jdroege shared his AI success framework with CNN's @claresduffy. cnn.com/2025/09/30/tec…
New @Scale_AI paper! The culprit behind reward hacking? We trace it to misspecification in high-reward tail. Our fix: rubric-based rewards to tell “excellent” responses apart from “great.” The result: Less hacking, stronger post-training! arxiv.org/pdf/2509.21500


We’re introducing SEAL Showdown, the AI leaderboard that actually captures real preferences, powered by a platform used by real people. Public benchmarks today rely on contrived tasks or narrow user groups. That leaves us guessing which models are actually preferred by people.…
United States 트렌드
- 1. Good Sunday 49.3K posts
- 2. Discussing Web3 N/A
- 3. #HealingFromMozambique 16.1K posts
- 4. #SundayMorning 1,289 posts
- 5. #sundayvibes 4,349 posts
- 6. Wordle 1,576 X N/A
- 7. Auburn 47.5K posts
- 8. Gilligan's Island 5,344 posts
- 9. #SEVENTEEN_NEW_IN_TACOMA 40.6K posts
- 10. QUICK TRADE 2,137 posts
- 11. #SVT_TOUR_NEW_ 32.4K posts
- 12. The CDC 31.5K posts
- 13. FDV 5min 2,152 posts
- 14. Utah 25K posts
- 15. Pegula 5,033 posts
- 16. Market Cap Surges N/A
- 17. Whale - Buy 1,764 posts
- 18. Boots 51K posts
- 19. Kirby Smart 9,162 posts
- 20. vergil 3,429 posts
내가 좋아할 만한 콘텐츠
Something went wrong.
Something went wrong.