Matthew Siegel

@LargerLanguage

AI technical writer and poet. Not simultaneously. Research, warm takes, and cool stuff we're doing at @Scale_AI. For poems: @MatthewSiegel_

Joined December 2024

89Posts 81Followers 76Following

Pinned

Matthew Siegel

@LargerLanguage

Aug 14

thoroughly excited to launch this video series where I break down new research from @Scale_AI. this first one is how we knock out 90% of errors in data that require human review using an autorater. check it out 👀

Matthew Siegel

@LargerLanguage

Oct 10

Lots of people are talking about an AI bubble. Sure. I don't think these people are reading the same research I'm reading.

Matthew Siegel reposted

Bing Liu

Oct 1

New @Scale_AI paper! The culprit behind reward hacking? We trace it to misspecification in high-reward tail. Our fix: rubric-based rewards to tell “excellent” responses apart from “great.” The result: Less hacking, stronger post-training! arxiv.org/pdf/2509.21500

vbingliu's tweet image. New @Scale_AI paper!

The culprit behind reward hacking? We trace it to misspecification in high-reward tail.

Our fix: rubric-based rewards to tell “excellent” responses apart from “great.”

The result: Less hacking, stronger post-training! arxiv.org/pdf/2509.21500

vbingliu's tweet image. New @Scale_AI paper!

The culprit behind reward hacking? We trace it to misspecification in high-reward tail.

Our fix: rubric-based rewards to tell “excellent” responses apart from “great.”

The result: Less hacking, stronger post-training! arxiv.org/pdf/2509.21500

Matthew Siegel

@LargerLanguage

Sep 24

This was a HUGE lift from our research team, huge thanks to everyone who contributed to this benchmark. ...and the blog doesn't look so bad either: scale.com/blog/swe-bench…

LargerLanguage's tweet card. SWE-Bench Pro raises the bar for coding benchmarks with diverse, real-world, contamination-resistant tasks.

SWE-Bench Pro: Raising the Bar for Agentic Coding | Scale

Source: scale.com

Bing Liu

Sep 20

🚀 Introducing SWE-Bench Pro — a new benchmark to evaluate LLM coding agents on real, enterprise-grade software engineering tasks. This is the next step beyond SWE-Bench: harder, contamination-resistant, and closer to real-world repos.

Matthew Siegel

@LargerLanguage

Sep 22

Working on this leaderboard page and blog was a LIFT! Huge thank you to every cook in the kitchen!! 🧑‍🍳👩‍🍳👨‍🍳

Scale AI

Sep 20

Introducing our Agentic Leaderboards. These new leaderboards test AI agents in real-world, high-complexity environments, setting a new standard for completing end-to-end digital tasks.

scale_AI's tweet image. Introducing our Agentic Leaderboards.

These new leaderboards test AI agents in real-world, high-complexity environments, setting a new standard for completing end-to-end digital tasks.

Ina

@davisonvar95518

Ngo Mien

@SkywalkerL56039

Chiara

@cajamarca925946

kolliechaperon45590

@kolliechap10033

Adeline

@Oloeqi5648

Daisy Amy

@often_laug

Harris

@harris_syie

Sci_Tech_Eng

@Sci_Tech_Eng

Chantal

@gockenbach95069

Huibing Dong

@gdymind

Georgina

@polomsky3255507

alindnbrg

@alindnbrg

franklin exkorn

@fexkorn

Preetam Birajdar

@preetam215

Khent Lord Guinita

@aimlfinalboss

Walden AI

@WaldenAi_Prompt

Mairead

@Udapog69073

Philippe Chevreuil

@ch63388

Eudes

@Eudesbpereira

Be Good & Do Good

@Share_truth111

Simbobaroo

@simbobaroo

Anisha Gunjal

@anisha_gunjal

Aaron

@o11ydad

Gl0bCX

@Gl0bCX

Jeff Kerr

@_jeffkerr

Daneel’s Mind: Cognitive AI and Complex Reasoning

@daneelolivawcr

Candice Schultz

@Candice_Chantel

David

@dvsatrn

Shivam Naik

@oshivamnaik

Skalz

@_skalz_

Sachin Pathak

@itssachinpathak

Bitcoin_The_Fiat_Tamer

@4Ever21Million

Young

@radypjam

Orange Cat 🍊🐈

@OrangeCatTikTok

Chen Bo Calvin Zhang

@calvincbzhang

Esperience Fred

@FDjinoudo

Rohan gupta

@file_future

Luke

@LNashville123

Lin Zagorski Latimer

@lin_zagorski

Tristan

@ItsTKai

omeed

@omeedtehrani

Y Sun

@YSun93241394098

peggy anne salz

@peggyanne

calix

@calixo

Evan Pinnix

@p1nn1x

Kael Sultan

@iblamekael

sat

@makulo254

saiteja bandari

@saiteja_081517

gabriel

@GabrielPeterss4

Nathaniel Li

@natliml

Chen Xing

@LynetteSohn

Vipul Gupta

@vipul_1011

Rakshith Sharma

@rsharma9201

Ernesto Gabriel Hernández Montoya

@eghmontoya

Chen Bo Calvin Zhang

@calvincbzhang

calix

@calixo

Bing Liu

@vbingliu

Gigab0t2

@gigab0t2

Elaine Lau

@Elaine_Lau99

Ethan Mollick

@emollick

Ryan Hylas

@RyanHylas

Manish

@sbmanish7

Tim Bauer

@TimTheSloth

Andy Arditi

@andyarditi

Miles Turpin

@milesaturpin

Bing Liu

@BingLiu1011

Anisha Gunjal

@anisha_gunjal

Anthony Wang

@aytwang

Vaskar Nath

@vaskar_n

DAIR.AI

@dair_ai

Julian Michael

@_julianmichael_

Jason Droege

@jdroege

Sean Hendryx

@SeanHendryx

nostalgebraist

@nostalgebraist

David Campbell

@MrDavidCampbell

Daniel Kokotajlo

@DKokotajlo

Scott Alexander

@slatestarcodex

Eli Collins

@elicollins

Tim Brooks

@_tim_brooks

Sesame

@sesame

Brendan Iribe

@brendaniribe

Bill Peebles

@billpeeb

Mark Chen

@markchen90

Jeremy Kritz

@JeremyKritz

DeepSeek

@deepseek_ai

Stanford HAI

@StanfordHAI

w̸͕͂͂a̷͔̗͐t̴̙͗e̵̬̔̕r̴̰̓̊m̵͙͖̓̽a̵̢̗̓͒r̸̲̽ķ̷͔́͝

@anthrupad

Zifan (Sail) Wang

@_zifan_wang

Summer Yue

@summeryue0

Kevin Roose

@kevinroose

AI Notkilleveryoneism Memes ⏸️

@AISafetyMemes

Stratechery

@stratechery

Morgan Brown

@morganb

Center for AI Safety

@ai_risks

James Bradbury

@jekbradbury

Google DeepMind

@GoogleDeepMind

Lex Fridman

@lexfridman

Neel Nanda

@NeelNanda5

United States Trends

Something went wrong.

Something went wrong.