scale_AI's profile picture. making AI work

Scale AI

@scale_AI

making AI work

Scale AI 님이 재게시함

🔄RLHF → RLVR → Rubrics → OnlineRubrics 👤 Human feedback = noisy & coarse 🧮 Verifiable rewards = too narrow 📋 Static rubrics = rigid, easy to hack, miss emergent behaviors 💡We introduce OnlineRubrics: elicited rubrics that evolve as models train. arxiv.org/abs/2510.07284

vbingliu's tweet image. 🔄RLHF → RLVR → Rubrics → OnlineRubrics

👤 Human feedback = noisy & coarse
🧮 Verifiable rewards = too narrow
📋 Static rubrics = rigid, easy to hack, miss emergent behaviors

💡We introduce OnlineRubrics: elicited rubrics that evolve as models train.
arxiv.org/abs/2510.07284

Scale AI 님이 재게시함

Sat down with @lennysan to talk about where AI is headed and how we’re making it work for model builders, enterprises and governments. Also went down memory lane about my time at Uber Eats. 🙂

In his first in-depth interview since taking over as @scale_AI CEO, @jdroege shares: 🔸 What actually happened with Meta’s $14 billion investment 🔸 Where frontier labs are heading next 🔸 Why most enterprise data is useless for AI models 🔸 What it takes to keep making AI model…



“I think one of the misunderstandings is that AI is this magic wand or it can solve all problems, and that’s not true today. But there is a ton of value when you get it right.” Our CEO @jdroege shared his AI success framework with CNN's @claresduffy. cnn.com/2025/09/30/tec…


Scale AI 님이 재게시함

New @Scale_AI paper! The culprit behind reward hacking? We trace it to misspecification in high-reward tail. Our fix: rubric-based rewards to tell “excellent” responses apart from “great.” The result: Less hacking, stronger post-training!  arxiv.org/pdf/2509.21500

vbingliu's tweet image. New @Scale_AI paper!

The culprit behind reward hacking? We trace it to misspecification in high-reward tail.

Our fix: rubric-based rewards to tell “excellent” responses apart from “great.”

The result: Less hacking, stronger post-training!  arxiv.org/pdf/2509.21500
vbingliu's tweet image. New @Scale_AI paper!

The culprit behind reward hacking? We trace it to misspecification in high-reward tail.

Our fix: rubric-based rewards to tell “excellent” responses apart from “great.”

The result: Less hacking, stronger post-training!  arxiv.org/pdf/2509.21500

Scale AI 님이 재게시함

We’re introducing SEAL Showdown, the AI leaderboard that actually captures real preferences, powered by a platform used by real people. Public benchmarks today rely on contrived tasks or narrow user groups. That leaves us guessing which models are actually preferred by people.…


Scale AI 님이 재게시함

🚀 Introducing SWE-Bench Pro — a new benchmark to evaluate LLM coding agents on real, enterprise-grade software engineering tasks. This is the next step beyond SWE-Bench: harder, contamination-resistant, and closer to real-world repos.


Loading...

Something went wrong.


Something went wrong.