Linghao Zhang

@starryzhangcs

Staff @XiaomiMiMo. I build something verifiable and scalable for code. Ex @MSFTResearch

Beijing

linghaoz.com

Joined March 2023

82Posts 54Followers 442Following

Linghao Zhang reposted

John Yang

@jyangballin

Oct 30

man I'm excited, can't wait to reveal soon what @KLieret @_carlosejimenez @OfirPress @lschmidt3 @Diyi_Yang and I have been working on. Where should code evaluation go after SWE-bench? We have an answer ⚔️

Linghao Zhang reposted

Shunyu Yao

@ShunyuYao12

Oct 23

If you are impacted by layoff, welcome to dm me

Linghao Zhang reposted

XiaomiMiMo

@XiaomiMiMo

Sep 19

👋 Say Hi to MiMo-Audio! Our BREAKTHROUGH in general-purpose audio intelligence. 🎯 Scaling pretraining to 100M+ hours leads to EMERGENCE of few-shot generalization across diverse audio tasks! 🔥 Post-trained MiMo-Audio-7B-Instruct: • crushes benchmarks: SOTA on MMSU, MMAU,…

XiaomiMiMo's tweet image. 👋 Say Hi to MiMo-Audio!
Our BREAKTHROUGH in general-purpose audio intelligence.

🎯 Scaling pretraining to 100M+ hours leads to EMERGENCE of few-shot generalization across diverse audio tasks!

🔥 Post-trained MiMo-Audio-7B-Instruct:
• crushes benchmarks: SOTA on MMSU, MMAU,…

Linghao Zhang reposted

Dimitris Papailiopoulos

@DimitrisPapail

Aug 13

GRPO makes reasoning model yap a lot, but there's a simple fix: Sample more responses during training, and train on the shortest ones. This creates a length pressure that makes the model sound much more terse, without sacrificing accuracy!! Examples of GRPO vs GFPO versions…

DimitrisPapail's tweet image. GRPO makes reasoning model yap a lot, but there's a simple fix:

Sample more responses during training, and train on the shortest ones.

This creates a length pressure that makes the model sound much more terse, without sacrificing accuracy!!

Examples of GRPO vs GFPO versions…

Dimitris Papailiopoulos

@DimitrisPapail

Aug 13

Thinking Less at test-time requires Sampling More at training-time! GFPO is a new, cool, and simple Policy Opt algorithm is coming to your RL Gym tonite, led by @VaishShrivas and our MSR group: Group Filtered PO (GFPO) trades off training-time with test-time compute, in order…

DimitrisPapail's tweet image. Thinking Less at test-time requires Sampling More at training-time!

GFPO is a new, cool, and simple Policy Opt algorithm is coming to your RL Gym tonite, led by @VaishShrivas and our MSR group:

Group Filtered PO (GFPO) trades off training-time with test-time compute, in order…

Linghao Zhang reposted

heiner

@HeinrichKuttler

Aug 6

cuelang.org

CUE

Configure Unify Execute Validate, define, and use dynamic and text‑based data Learn more Get started with CUE CUE makes it easy to validate data, write schemas, and ensure configurations align with...

Source: cuelang.org

Linghao Zhang reposted

XiaomiMiMo

@XiaomiMiMo

Aug 8

🚀 MiMo‑VL 2508 is live! Same size, much smarter 🚀 We’ve upgraded performance, thinking control, and overall user experience. 📈 Benchmark gains across image + video: MMMU 70.6, VideoMME 70.8. Consistent improvements across the board. 🤖 Thinking Control: toggle reasoning…

XiaomiMiMo's tweet image. 🚀 MiMo‑VL 2508 is live! Same size, much smarter 🚀

We’ve upgraded performance, thinking control, and overall user experience.

📈 Benchmark gains across image + video: MMMU 70.6, VideoMME 70.8. Consistent improvements across the board.

🤖 Thinking Control: toggle reasoning…

Linghao Zhang

@starryzhangcs

Aug 8

exactly

Justus Mattern

@MatternJustus

Aug 7

The biggest lever towards ASI is right in front of you (RL environments) but you feel too smart to work on data

Linghao Zhang reposted

elvis

@omarsar0

Aug 2

2026+: everyone releases their own OS Building with Claude Code SDK made me realize that we are just a UI away from the next ChatGPT moment. Models are more intelligent than they seem. AI Agents are already unlocking unique and novel experiences. Claude Code is the…

omarsar0's tweet image. 2026+: everyone releases their own OS

Building with Claude Code SDK made me realize that we are just a UI away from the next ChatGPT moment.

Models are more intelligent than they seem.

AI Agents are already unlocking unique and novel experiences.

Claude Code is the…

Linghao Zhang reposted

carlos

@_carlosejimenez

Jul 31

What happens if you compare LMs on SWE-bench without the fancy scaffolds? Our new leaderboard “SWE-bench (bash only)” shows you which LMs are the best at getting the job done with just bash. More on why this is important 👇

_carlosejimenez's tweet image. What happens if you compare LMs on SWE-bench without the fancy scaffolds?
Our new leaderboard “SWE-bench (bash only)” shows you which LMs are the best at getting the job done with just bash.
More on why this is important 👇

Linghao Zhang reposted

Wenhao Yu

@wyu_nd

Jul 24

🗒️Have been exploring Agent-RL training over the past few months, particularly in GUI scenarios. Here’s a summary of some practical insights and lessons 🤔 learned from the perspective of an industry researcher, and some reference papers.

wyu_nd's tweet image. 🗒️Have been exploring Agent-RL training over the past few months, particularly in GUI scenarios.

Here’s a summary of some practical insights and lessons 🤔 learned from the perspective of an industry researcher, and some reference papers.

Linghao Zhang reposted

Denny Zhou

@denny_zhou

Jul 24

Slides for my lecture “LLM Reasoning” at Stanford CS 25: dennyzhou.github.io/LLM-Reasoning-… Key points: 1. Reasoning in LLMs simply means generating a sequence of intermediate tokens before producing the final answer. Whether this resembles human reasoning is irrelevant. The crucial…

Linghao Zhang

@starryzhangcs

Jul 24

based

Kilian Lieret

@KLieret

Jul 24

Releasing mini, a radically simple SWE-agent: 100 lines of code, 0 special tools, and gets 65% on SWE-bench verified! Made for benchmarking, fine-tuning, RL, or just for use from your terminal. It’s open source, simple to hack, and compatible with any LM! Link in 🧵

KLieret's tweet image. Releasing mini, a radically simple SWE-agent: 100 lines of code, 0 special tools, and gets 65% on SWE-bench verified!
Made for benchmarking, fine-tuning, RL, or just for use from your terminal.
It’s open source, simple to hack, and compatible with any LM! Link in 🧵

Linghao Zhang reposted

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

Jul 24

Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains 'We introduce Rubrics as Rewards (RaR), a framework that uses structured, checklist-style rubrics as interpretable reward signals for on-policy training with GRPO. Our best RaR method yields up to a relative…

iScienceLuvr's tweet image. Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains

'We introduce Rubrics as Rewards (RaR), a framework that uses structured, checklist-style rubrics as interpretable reward signals for on-policy training with GRPO. Our best RaR method yields up to a relative…

Linghao Zhang reposted

Kimi.ai

@Kimi_Moonshot

Jul 22

Kimi K2 tech report just dropped! Quick hits: - MuonClip optimizer: stable + token-efficient pretraining at trillion-parameter scale - 20K+ tools, real & simulated: unlocking scalable agentic data - Joint RL with verifiable + self-critique rubric rewards: alignment that adapts -…

Kimi_Moonshot's tweet image. Kimi K2 tech report just dropped!

Quick hits:
- MuonClip optimizer: stable + token-efficient pretraining at trillion-parameter scale
- 20K+ tools, real &amp; simulated: unlocking scalable agentic data
- Joint RL with verifiable + self-critique rubric rewards: alignment that adapts
-…

Linghao Zhang reposted

Kimi.ai

@Kimi_Moonshot

Jul 21

🌕 Did you notice Kimi’s doodle today at kimi.ai? It’s our little Moon Day surprise - A tribute to the spirit of exploration, and to the day humans first set foot on the Moon 🍻 May Kimi fuel your next big idea 💡

Linghao Zhang reposted

OpenRouter

@OpenRouterAI

Jul 18

kimi

Toven

@pingToven

Jul 18

kimi

Linghao Zhang reposted

Alex Shaw

@alexgshaw

Jul 16

Evaluating agents on benchmarks is a pain. Each benchmark comes with its own harness, scoring scripts, and environments and integrating can take days. We're introducing the Terminal-Bench dataset registry to solve this problem. Think of it as the npm of agent benchmarks. Now…

alexgshaw's tweet image. Evaluating agents on benchmarks is a pain. Each benchmark comes with its own harness, scoring scripts, and environments and integrating can take days.

We're introducing the Terminal-Bench dataset registry to solve this problem. Think of it as the npm of agent benchmarks.

Now…

Linghao Zhang reposted

Andrej Karpathy

@karpathy

Jul 13

Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly…

Linghao Zhang reposted

Shengyang Sun

@ssydasheng

Jul 10

We built 200k-GPU clusters; We scaled up & curated higher-quality data; We scaled compute by 100x; We developed training & test-time recipes; We made everything RL native; We stabilized infrastructure and speeded up; That's how you turn RL into the pre-training scale. Yet I am…

ssydasheng's tweet image. We built 200k-GPU clusters;
We scaled up &amp; curated higher-quality data;
We scaled compute by 100x;
We developed training &amp; test-time recipes;
We made everything RL native;
We stabilized infrastructure and speeded up;

That's how you turn RL into the pre-training scale.
Yet I am…

Linghao Zhang reposted

Dhravya Shah

@DhravyaShah

Jul 4

> opens claude code > write a huge ass prompt > auto-accept edits > go drink water, watch a yt video, chillax, work on something else > come back, work is done. What a time to live in. wow.