Wenting Zhao

@wzhao_nlp

reasoning & llms @Alibaba_Qwen Opinions are my own

NYC

wenting-zhao.github.io

เข้าร่วมเมื่อ มิถุนายน 2013

485โพสต์ 6พันผู้ติดตาม 618กําลังติดตาม

คุณอาจชื่นชอบ

@yuntiandeng

@zhaofeng_wu

@AnnieFeng6

@hamishivi

@prasann_singhal

@gu_yuling

@chrome1996

@belindazli

@ZhaoweiWang4

@danieljwkim

@kylelostat

@SonglinYang4

@IanMagnusson

@nsubramani23

@swarnaNLP

.@ericzelikman & 7th Googler @gharik are raising $1b for an AI lab called Humans&. I'm told Eric's paper STaR was an inspiration for OpenAI's reasoning models, and that he was also one of the star AI researchers labs fought over. forbes.com/sites/annatong…

annatonger's tweet card. Founded by former xAI staffer Eric Zelikman and early Googler Georges Harik, the new lab is building models that are better at interacting with humans.

xAI Researcher, Early Googler In Talks To Raise $1 Billion At $5 Billion Valuation For New Frontier...

แหล่งที่มา: forbes.com

Wenting Zhao รีโพสต์แล้ว

Songlin Yang

@SonglinYang4

30 ต.ค.

Many people are confused by Minimax’s recent return to full attention - especially since it was the first large-scale pivot toward hybrid linear attention - and by Kimi’s later adoption of hybrid linear variants (as well as earlier attempts by Qwen3-Next, or Qwen3.5). I actually…

Wenting Zhao รีโพสต์แล้ว

Pengyu Zhao

@zpysky1125

30 ต.ค.

MiniMax M2 Tech Blogs on Huggingface: 1. huggingface.co/blog/MiniMax-A… 2. huggingface.co/blog/MiniMax-A… 3. huggingface.co/blog/MiniMax-A…

Why Did MiniMax M2 End Up as a Full Attention Model?

แหล่งที่มา: huggingface.co

Pengyu Zhao

@zpysky1125

29 ต.ค.

MiniMax M2 Tech Blog 3: Why Did M2 End Up as a Full Attention Model? On behave of pre-training lead Haohai Sun. (zhihu.com/question/19653…) I. Introduction As the lead of MiniMax-M2 pretrain, I've been getting many queries from the community on "Why did you turn back the clock…

Wenting Zhao

@wzhao_nlp

30 ต.ค.

If you happen to be in Shanghai next Monday, come hang out with us 🤩

Chen Zhao

@henryzhao4321

30 ต.ค.

We will have a pre-EMNLP workshop about LLMs next Monday at @nyushanghai campus! Speakers are working on diverse and fantastic problems, really looking forward to it! We also provide a zoom link for those who cannot join in person :) (see poster)

henryzhao4321's tweet image. We will have a pre-EMNLP workshop about LLMs next Monday at @nyushanghai campus! Speakers are working on diverse and fantastic problems, really looking forward to it! We also provide a zoom link for those who cannot join in person :) (see poster)

Wenting Zhao รีโพสต์แล้ว

Sasha Rush

@srush_nlp

29 ต.ค.

One personal reflection is how interesting a challenge RL is. Unlike other ML systems, you can't abstract much from the full-scale system. Roughly, we co-designed this project and Cursor together in order to allow running the agent at the necessary scale.

Wenting Zhao รีโพสต์แล้ว

Chieh-Hsin (Jesse) Lai

@JCJesseLai

29 ต.ค.

Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! 📘 We’re excited to release 《The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core…

JCJesseLai's tweet image. Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on!

📘 We’re excited to release 《The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon.

It traces the core…

Wenting Zhao

@wzhao_nlp

29 ต.ค.

The question I got asked most frequently during COLM this year was what research questions can be studied in academia that will also be relevant to frontier labs. So I’m making a talk for this. What topics / areas should I cover? RL/eval/pretraining,?

Wenting Zhao รีโพสต์แล้ว

Thinking Machines

@thinkymachines

27 ต.ค.

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other…

thinkymachines's tweet image. Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other…

Wenting Zhao รีโพสต์แล้ว

Ben Lang

@benln

28 ต.ค.

Cursor team is stacked on X, shortlist for insider updates: • @ryolu_ - design • @ericzakariasson - dev rel • @TheRohanVarma - product • @leerob - dev rel • @JuanRezzio - QA engineering • @davidrfgomes - engineering • @austinnickpiel - engineering • @milichab - product…

Wenting Zhao รีโพสต์แล้ว

Jason Weston

@jaseweston

29 ต.ค.

🌶️SPICE: Self-Play in Corpus Environments🌶️ 📝: arxiv.org/abs/2510.24684 - Challenger creates tasks based on *corpora* - Reasoner solves them - Both trained together ⚔️ -> automatic curriculum! 🔥 Outperforms standard (ungrounded) self-play Grounding fixes hallucination & lack of…

jaseweston's tweet image. 🌶️SPICE: Self-Play in Corpus Environments🌶️
📝: arxiv.org/abs/2510.24684
- Challenger creates tasks based on *corpora*
- Reasoner solves them
- Both trained together ⚔️ -&gt; automatic curriculum!
🔥 Outperforms standard (ungrounded) self-play
Grounding fixes hallucination &amp; lack of…

Wenting Zhao รีโพสต์แล้ว

vLLM

@vllm_project

22 ต.ค.

it’s tokenization again! 🤯 did you know tokenize(detokenize(token_ids)) ≠ token_ids? RL researchers from Agent Lightning coined the term Retokenization Drift — a subtle mismatch between what your model generated and what your trainer thinks it generated. why? because most…

vllm_project's tweet card. Let's build the GPT Tokenizer

youtube.com

YouTube

Let's build the GPT Tokenizer

แหล่งที่มา: youtube.com

Wenting Zhao รีโพสต์แล้ว

Noam Brown

@polynoamial

21 ต.ค.

Below is a deep dive into why self play works for two-player zero-sum (2p0s) games like Go/Poker/Starcraft but is so much harder to use in "real world" domains. tl;dr: self play converges to minimax in 2p0s games, and minimax is really useful in those games. Every finite 2p0s…

polynoamial's tweet image. Below is a deep dive into why self play works for two-player zero-sum (2p0s) games like Go/Poker/Starcraft but is so much harder to use in "real world" domains. tl;dr: self play converges to minimax in 2p0s games, and minimax is really useful in those games.

Every finite 2p0s…

Noam Brown

@polynoamial

21 ต.ค.

Self play works so well in chess, go, and poker because those games are two-player zero-sum. That simplifies a lot of problems. The real world is messier, which is why we haven’t seen many successes from self play in LLMs yet. Btw @karpathy did great and I mostly agree with him!

Wenting Zhao

@wzhao_nlp

13 ต.ค.

People ask about how to be hired by frontier labs? Understand and be able to produce every detail👇

Andrej Karpathy

@karpathy

13 ต.ค.

Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,…

karpathy's tweet image. Excited to release new repo: nanochat!
(it's among the most unhinged I've written).

Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,…

Wenting Zhao รีโพสต์แล้ว

Nathan Lambert

@natolambert

9 ต.ค.

Talk from Wenting Zhao of Qwen on their plans during COLM. Seems like 1 word is the plan still: scaling training up! Let’s go.

Wenting Zhao

@wzhao_nlp

7 ต.ค.

I was really looking forward to be at #COLM2025 with Junyang, but visa takes forever 😞 come ask me about Qwen: how is it like to work here, what features you’d like to see, what bugs you’d like us to fix, or anything!

Junyang Lin

@JustinLin610

7 ต.ค.

Sorry about missing COLM due to my failure in my VISA application. @wzhao_nlp will be there and represent Qwen to give a speech and discuss on the panel about reasoning and agents!

Wenting Zhao

@wzhao_nlp

6 ต.ค.

Want to hear some hot takes about the future of language modeling, and share your takes too? Stop by the Visions of Language Modeling workshop at COLM on Friday, October 10 in room 519A! There will be over a dozen speakers working on all kinds of problems in modeling language and…

wzhao_nlp's tweet image. Want to hear some hot takes about the future of language modeling, and share your takes too? Stop by the Visions of Language Modeling workshop at COLM on Friday, October 10 in room 519A! There will be over a dozen speakers working on all kinds of problems in modeling language and…

Wenting Zhao รีโพสต์แล้ว

Behnam Neyshabur

@bneyshabur

5 ต.ค.

When @ethansdyer and I joined Anthropic last Dec and spearheaded the discovery team, we decided to focus on unlocking computer-use as a bottleneck for scientific discovery. It has been incredible to work on improving computer-use and witness the fast progress. In OSWorld for…

bneyshabur's tweet image. When @ethansdyer and I joined Anthropic last Dec and spearheaded the discovery team, we decided to focus on unlocking computer-use as a bottleneck for scientific discovery. It has been incredible to work on improving computer-use and witness the fast progress. In OSWorld for…

Wenting Zhao รีโพสต์แล้ว

Tanya Goyal

@tanyaagoyal

2 ต.ค.

🚨Modeling Abstention via Selective Help-seeking LLMs learn to use search tools to answer questions they would otherwise hallucinate on. But can this also teach them what they know vs not? @momergul_ introduces MASH that trains LLMs for search and gets abstentions for free!…

tanyaagoyal's tweet image. 🚨Modeling Abstention via Selective Help-seeking

LLMs learn to use search tools to answer questions they would otherwise hallucinate on. But can this also teach them what they know vs not?

@momergul_ introduces MASH that trains LLMs for search and gets abstentions for free!…

Wenting Zhao

@wzhao_nlp

2 ต.ค.

Really excited and proud to see Qwen models are in the first batch of supported models for the tinker service! 🤩 we will continue to release great models to grow research in the community 😎

wzhao_nlp's tweet image. Really excited and proud to see Qwen models are in the first batch of supported models for the tinker service! 🤩 we will continue to release great models to grow research in the community 😎

Thinking Machines

@thinkymachines

1 ต.ค.

Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models!…

thinkymachines's tweet image. Introducing Tinker: a flexible API for fine-tuning language models.

Write training loops in Python on your laptop; we'll run them on distributed GPUs.

Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models!…

Wenting Zhao รีโพสต์แล้ว

Rafael Rafailov

@rm_rafailov

29 ก.ย.

The most surprising thing working on this was that RL with LoRA completely matches full training and develops the same extended reasoning patterns. I think this is a great sign for custom agent training.

Thinking Machines

@thinkymachines

29 ก.ย.

LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.…