starryzhangcs's profile picture. Staff @XiaomiMiMo. I build something verifiable and scalable for code. Ex @MSFTResearch

Linghao Zhang

@starryzhangcs

Staff @XiaomiMiMo. I build something verifiable and scalable for code. Ex @MSFTResearch

Linghao Zhang reposted

man I'm excited, can't wait to reveal soon what @KLieret @_carlosejimenez @OfirPress @lschmidt3 @Diyi_Yang and I have been working on. Where should code evaluation go after SWE-bench? We have an answer ⚔️


Linghao Zhang reposted

If you are impacted by layoff, welcome to dm me


Linghao Zhang reposted

👋 Say Hi to MiMo-Audio! Our BREAKTHROUGH in general-purpose audio intelligence. 🎯 Scaling pretraining to 100M+ hours leads to EMERGENCE of few-shot generalization across diverse audio tasks! 🔥 Post-trained MiMo-Audio-7B-Instruct: • crushes benchmarks: SOTA on MMSU, MMAU,…

XiaomiMiMo's tweet image. 👋 Say Hi to MiMo-Audio! 
Our BREAKTHROUGH in general-purpose audio intelligence.

🎯 Scaling pretraining to 100M+ hours leads to EMERGENCE of few-shot generalization across diverse audio tasks!

🔥 Post-trained MiMo-Audio-7B-Instruct:
• crushes benchmarks: SOTA on MMSU, MMAU,…

Linghao Zhang reposted

GRPO makes reasoning model yap a lot, but there's a simple fix: Sample more responses during training, and train on the shortest ones. This creates a length pressure that makes the model sound much more terse, without sacrificing accuracy!! Examples of GRPO vs GFPO versions…

DimitrisPapail's tweet image. GRPO makes reasoning model yap a lot, but there's a simple fix:

Sample more responses during training, and train on the shortest ones. 

This creates a length pressure that makes the model sound much more terse, without sacrificing accuracy!!

Examples of GRPO vs GFPO versions…
DimitrisPapail's tweet image. GRPO makes reasoning model yap a lot, but there's a simple fix:

Sample more responses during training, and train on the shortest ones. 

This creates a length pressure that makes the model sound much more terse, without sacrificing accuracy!!

Examples of GRPO vs GFPO versions…

Thinking Less at test-time requires Sampling More at training-time! GFPO is a new, cool, and simple Policy Opt algorithm is coming to your RL Gym tonite, led by @VaishShrivas and our MSR group: Group Filtered PO (GFPO) trades off training-time with test-time compute, in order…

DimitrisPapail's tweet image. Thinking Less at test-time requires Sampling More at training-time!

GFPO is a new, cool, and simple Policy Opt algorithm is coming to your RL Gym tonite, led by @VaishShrivas and our MSR group:

Group Filtered PO (GFPO) trades off training-time with test-time compute, in order…
DimitrisPapail's tweet image. Thinking Less at test-time requires Sampling More at training-time!

GFPO is a new, cool, and simple Policy Opt algorithm is coming to your RL Gym tonite, led by @VaishShrivas and our MSR group:

Group Filtered PO (GFPO) trades off training-time with test-time compute, in order…
DimitrisPapail's tweet image. Thinking Less at test-time requires Sampling More at training-time!

GFPO is a new, cool, and simple Policy Opt algorithm is coming to your RL Gym tonite, led by @VaishShrivas and our MSR group:

Group Filtered PO (GFPO) trades off training-time with test-time compute, in order…
DimitrisPapail's tweet image. Thinking Less at test-time requires Sampling More at training-time!

GFPO is a new, cool, and simple Policy Opt algorithm is coming to your RL Gym tonite, led by @VaishShrivas and our MSR group:

Group Filtered PO (GFPO) trades off training-time with test-time compute, in order…


Linghao Zhang reposted

🚀 MiMo‑VL 2508 is live! Same size, much smarter 🚀 We’ve upgraded performance, thinking control, and overall user experience. 📈 Benchmark gains across image + video: MMMU 70.6, VideoMME 70.8. Consistent improvements across the board. 🤖 Thinking Control: toggle reasoning…

XiaomiMiMo's tweet image. 🚀 MiMo‑VL 2508 is live!  Same size, much smarter 🚀

We’ve upgraded performance, thinking control, and overall user experience.

📈 Benchmark gains across image + video: MMMU 70.6, VideoMME 70.8.  Consistent improvements across the board.

🤖 Thinking Control: toggle reasoning…

exactly

The biggest lever towards ASI is right in front of you (RL environments) but you feel too smart to work on data



Linghao Zhang reposted

2026+: everyone releases their own OS Building with Claude Code SDK made me realize that we are just a UI away from the next ChatGPT moment. Models are more intelligent than they seem. AI Agents are already unlocking unique and novel experiences. Claude Code is the…

omarsar0's tweet image. 2026+: everyone releases their own OS

Building with Claude Code SDK made me realize that we are just a UI away from the next ChatGPT moment.

Models are more intelligent than they seem.

AI Agents are already unlocking unique and novel experiences.

Claude Code is the…

Linghao Zhang reposted

What happens if you compare LMs on SWE-bench without the fancy scaffolds? Our new leaderboard “SWE-bench (bash only)” shows you which LMs are the best at getting the job done with just bash. More on why this is important 👇

_carlosejimenez's tweet image. What happens if you compare LMs on SWE-bench without the fancy scaffolds?
Our new leaderboard “SWE-bench (bash only)” shows you which LMs are the best at getting the job done with just bash.
More on why this is important 👇

Linghao Zhang reposted

🗒️Have been exploring Agent-RL training over the past few months, particularly in GUI scenarios. Here’s a summary of some practical insights and lessons 🤔 learned from the perspective of an industry researcher, and some reference papers.

wyu_nd's tweet image. 🗒️Have been exploring Agent-RL training over the past few months, particularly in GUI scenarios.

Here’s a summary of some practical insights and lessons 🤔 learned from the perspective of an industry researcher, and some reference papers.
wyu_nd's tweet image. 🗒️Have been exploring Agent-RL training over the past few months, particularly in GUI scenarios.

Here’s a summary of some practical insights and lessons 🤔 learned from the perspective of an industry researcher, and some reference papers.

Linghao Zhang reposted

Slides for my lecture “LLM Reasoning” at Stanford CS 25: dennyzhou.github.io/LLM-Reasoning-… Key points: 1. Reasoning in LLMs simply means generating a sequence of intermediate tokens before producing the final answer. Whether this resembles human reasoning is irrelevant. The crucial…


based

Releasing mini, a radically simple SWE-agent: 100 lines of code, 0 special tools, and gets 65% on SWE-bench verified! Made for benchmarking, fine-tuning, RL, or just for use from your terminal. It’s open source, simple to hack, and compatible with any LM! Link in 🧵

KLieret's tweet image. Releasing mini, a radically simple SWE-agent: 100 lines of code, 0 special tools, and gets 65% on SWE-bench verified!
Made for benchmarking, fine-tuning, RL, or just for use from your terminal.
It’s open source, simple to hack, and compatible with any LM! Link in 🧵


Linghao Zhang reposted

Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains 'We introduce Rubrics as Rewards (RaR), a framework that uses structured, checklist-style rubrics as interpretable reward signals for on-policy training with GRPO. Our best RaR method yields up to a relative…

iScienceLuvr's tweet image. Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains

'We introduce Rubrics as Rewards (RaR), a framework that uses structured, checklist-style rubrics as interpretable reward signals for on-policy training with GRPO. Our best RaR method yields up to a  relative…

Linghao Zhang reposted

Kimi K2 tech report just dropped! Quick hits: - MuonClip optimizer: stable + token-efficient pretraining at trillion-parameter scale - 20K+ tools, real & simulated: unlocking scalable agentic data - Joint RL with verifiable + self-critique rubric rewards: alignment that adapts -…

Kimi_Moonshot's tweet image. Kimi K2 tech report just dropped!

Quick hits:
- MuonClip optimizer: stable + token-efficient pretraining at trillion-parameter scale
- 20K+ tools, real & simulated: unlocking scalable agentic data
- Joint RL with verifiable + self-critique rubric rewards: alignment that adapts
-…
Kimi_Moonshot's tweet image. Kimi K2 tech report just dropped!

Quick hits:
- MuonClip optimizer: stable + token-efficient pretraining at trillion-parameter scale
- 20K+ tools, real & simulated: unlocking scalable agentic data
- Joint RL with verifiable + self-critique rubric rewards: alignment that adapts
-…
Kimi_Moonshot's tweet image. Kimi K2 tech report just dropped!

Quick hits:
- MuonClip optimizer: stable + token-efficient pretraining at trillion-parameter scale
- 20K+ tools, real & simulated: unlocking scalable agentic data
- Joint RL with verifiable + self-critique rubric rewards: alignment that adapts
-…
Kimi_Moonshot's tweet image. Kimi K2 tech report just dropped!

Quick hits:
- MuonClip optimizer: stable + token-efficient pretraining at trillion-parameter scale
- 20K+ tools, real & simulated: unlocking scalable agentic data
- Joint RL with verifiable + self-critique rubric rewards: alignment that adapts
-…

Linghao Zhang reposted

🌕 Did you notice Kimi’s doodle today at kimi.ai? It’s our little Moon Day surprise - A tribute to the spirit of exploration, and to the day humans first set foot on the Moon 🍻 May Kimi fuel your next big idea 💡


Linghao Zhang reposted

kimi


Linghao Zhang reposted

Evaluating agents on benchmarks is a pain. Each benchmark comes with its own harness, scoring scripts, and environments and integrating can take days. We're introducing the Terminal-Bench dataset registry to solve this problem. Think of it as the npm of agent benchmarks. Now…

alexgshaw's tweet image. Evaluating agents on benchmarks is a pain. Each benchmark comes with its own harness, scoring scripts, and environments and integrating can take days.

We're introducing the Terminal-Bench dataset registry to solve this problem. Think of it as the npm of agent benchmarks.

Now…

Linghao Zhang reposted

Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly…


Linghao Zhang reposted

We built 200k-GPU clusters; We scaled up & curated higher-quality data; We scaled compute by 100x; We developed training & test-time recipes; We made everything RL native; We stabilized infrastructure and speeded up; That's how you turn RL into the pre-training scale. Yet I am…

ssydasheng's tweet image. We built 200k-GPU clusters; 
We scaled up & curated higher-quality data;
We scaled compute by 100x;
We developed training & test-time recipes;
We made everything RL native;
We stabilized infrastructure and speeded up;

That's how you turn RL into the pre-training scale.
Yet I am…

Linghao Zhang reposted

> opens claude code > write a huge ass prompt > auto-accept edits > go drink water, watch a yt video, chillax, work on something else > come back, work is done. What a time to live in. wow.

DhravyaShah's tweet image. > opens claude code
> write a huge ass prompt
> auto-accept edits
> go drink water, watch a yt video, chillax, work on something else

> come back, work is done. 

What a time to live in. wow.

United States Trends

Loading...

Something went wrong.


Something went wrong.