yifan_zhang_'s profile picture. PhD student at @Princeton University, focusing on LLMs. Language Modeling and Pretraining, LLM Reasoning and RL. Prev @ Seed @UCLA @Tsinghua_IIIS

Yifan Zhang @ NeurIPS

@yifan_zhang_

PhD student at @Princeton University, focusing on LLMs. Language Modeling and Pretraining, LLM Reasoning and RL. Prev @ Seed @UCLA @Tsinghua_IIIS

Sabitlenmiş

🚀DeepSeek V3.2 officially utilized our corrected KL regularization term in their training objective! On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning (arxiv.org/abs/2505.17508) See also tinker-docs.thinkingmachines.ai/losses It will be even better if they can…

yifan_zhang_'s tweet image. 🚀DeepSeek V3.2 officially utilized our corrected KL regularization term in their training objective!

On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning (arxiv.org/abs/2505.17508)

See also tinker-docs.thinkingmachines.ai/losses

It will be even better if they can…

🚀 Launching DeepSeek-V3.2 & DeepSeek-V3.2-Speciale — Reasoning-first models built for agents! 🔹 DeepSeek-V3.2: Official successor to V3.2-Exp. Now live on App, Web & API. 🔹 DeepSeek-V3.2-Speciale: Pushing the boundaries of reasoning capabilities. API-only for now. 📄 Tech…

deepseek_ai's tweet image. 🚀 Launching DeepSeek-V3.2 & DeepSeek-V3.2-Speciale — Reasoning-first models built for agents!

🔹 DeepSeek-V3.2: Official successor to V3.2-Exp. Now live on App, Web & API.
🔹 DeepSeek-V3.2-Speciale: Pushing the boundaries of reasoning capabilities. API-only for now.

📄 Tech…


Historical.

Join us at the 5th MATH-AI Workshop at @NeurIPSConf now, our biggest year yet with a record 249 submissions!! 🎉 ➡️ mathai2025.github.io We’re honored to host an incredible lineup of speakers: @swarat @WeizhuChen @j_dekoninck @Leonard41111588 @HannaHajishirzi @chijinML

lupantech's tweet image. Join us at the 5th MATH-AI Workshop at @NeurIPSConf now, our biggest year yet with a record 249 submissions!! 🎉
➡️ mathai2025.github.io

We’re honored to host an incredible lineup of speakers:
@swarat @WeizhuChen @j_dekoninck @Leonard41111588 @HannaHajishirzi @chijinML…
lupantech's tweet image. Join us at the 5th MATH-AI Workshop at @NeurIPSConf now, our biggest year yet with a record 249 submissions!! 🎉
➡️ mathai2025.github.io

We’re honored to host an incredible lineup of speakers:
@swarat @WeizhuChen @j_dekoninck @Leonard41111588 @HannaHajishirzi @chijinML…
lupantech's tweet image. Join us at the 5th MATH-AI Workshop at @NeurIPSConf now, our biggest year yet with a record 249 submissions!! 🎉
➡️ mathai2025.github.io

We’re honored to host an incredible lineup of speakers:
@swarat @WeizhuChen @j_dekoninck @Leonard41111588 @HannaHajishirzi @chijinML…
lupantech's tweet image. Join us at the 5th MATH-AI Workshop at @NeurIPSConf now, our biggest year yet with a record 249 submissions!! 🎉
➡️ mathai2025.github.io

We’re honored to host an incredible lineup of speakers:
@swarat @WeizhuChen @j_dekoninck @Leonard41111588 @HannaHajishirzi @chijinML…


Hope you enjoyed yesterday’s poster! We were honored to have @SonglinYang4, @Xinyu2ML, @wen_kaiyue, and many other esteemed researchers visit and share their guidance! 🚀

yifan_zhang_'s tweet image. Hope you enjoyed yesterday’s poster! We were honored to have @SonglinYang4, @Xinyu2ML, @wen_kaiyue, and many other esteemed researchers visit and share their guidance! 🚀

Oh, Post-AGI Speakers

yifan_zhang_'s tweet image. Oh, Post-AGI Speakers

Mistral Large 675B, WE ARE SO BACK!

Introducing the Mistral 3 family of models: Frontier intelligence at all sizes. Apache 2.0. Details in 🧵

MistralAI's tweet image. Introducing the Mistral 3 family of models: Frontier intelligence at all sizes. Apache 2.0. Details in 🧵


Fantastic.

Together with @yuxiangw_cs and Maryam Fazel, we are excited to present our tutorial "Theoretical Insights on Training Instability in Deep Learning" tomorrow at #NeurIPS2025! Link: uuujf.github.io/instability/ *picture generated by Gemini

uuujingfeng's tweet image. Together with @yuxiangw_cs and Maryam Fazel, we are excited to present our tutorial "Theoretical Insights on Training Instability in Deep Learning" tomorrow at #NeurIPS2025!
Link: uuujf.github.io/instability/
*picture generated by Gemini


LFG!

I will go to NeurIPS 2025@San Diego during Dec. 2-7 for my spotlight paper "Tensor product attention is all you need", and I'm also excited to meet all of you there to discuss anything interesting, exciting and enlightening about AI, LLMs and next trend of innovation.

YIFENGLIU_AI's tweet image. I will go to NeurIPS 2025@San Diego during Dec. 2-7 for my spotlight paper "Tensor product attention is all you need", and I'm also excited to meet all of you there to discuss anything interesting, exciting and enlightening about AI, LLMs and next trend of innovation.


The Next Scaling Frontier: On the Scaling Laws of Technical Blogs

Today, OpenAI is launching a new Alignment Research blog: a space for publishing more of our work on alignment and safety more frequently, and for a technical audience. alignment.openai.com



Stanford NLP Group reposted it, thanks a lot! 😀 Kudos to open research! 🚀

yifan_zhang_'s tweet image. Stanford NLP Group reposted it, thanks a lot! 😀

Kudos to open research! 🚀

🚀DeepSeek V3.2 officially utilized our corrected KL regularization term in their training objective! On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning (arxiv.org/abs/2505.17508) See also tinker-docs.thinkingmachines.ai/losses It will be even better if they can…

yifan_zhang_'s tweet image. 🚀DeepSeek V3.2 officially utilized our corrected KL regularization term in their training objective!

On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning (arxiv.org/abs/2505.17508)

See also tinker-docs.thinkingmachines.ai/losses

It will be even better if they can…


Great work! glad to see the Proposer (Generator)-Verifier system works so well. 🚀Please refer to our previous work: arxiv.org/pdf/2308.04371 cumulative-reasoning.github.io

yifan_zhang_'s tweet image. Great work! glad to see the Proposer (Generator)-Verifier system works so well. 

🚀Please refer to our previous work: 
arxiv.org/pdf/2308.04371

cumulative-reasoning.github.io

Olmo 3-32B uses GQA (KV heads 8) and SWA (3/4 layers) 🚀 allenai.org/papers/olmo3

yifan_zhang_'s tweet image. Olmo 3-32B uses GQA (KV heads 8) and SWA (3/4 layers) 🚀

allenai.org/papers/olmo3

Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey. Best fully open 32B reasoning model & best 32B base model. 🧵

allen_ai's tweet image. Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey.
Best fully open 32B reasoning model & best 32B base model. 🧵


Gemini 3.0 Pro Evals

yifan_zhang_'s tweet image. Gemini 3.0 Pro Evals

A new model without RoPE. Farewell.

seems like the new MoEs by @arcee_ai are coming soon, super excited for this release lfg here is a recap of the modeling choice according to the transformers PR: > MoE (2 shared experts, top-k=6, 64 total experts, sigmoid routing) > GQA with gated attention > NoPE on the global…



Worth reading, having thought about similar ideas, glad to see this actually works. 😀

RL is bounded by finite data😣? Introducing RLVE: RL with Adaptive Verifiable Environments We scale RL with data procedurally generated from 400 envs dynamically adapting to the trained model 💡find supervision signals right at the LM capability frontier + scale them 🔗in🧵…

ZhiyuanZeng_'s tweet image. RL is bounded by finite data😣?
Introducing RLVE: RL with Adaptive Verifiable Environments

We scale RL with data procedurally generated from 400 envs dynamically adapting to the trained model

💡find supervision signals right at the LM capability frontier + scale them

🔗in🧵…


John Schulman just followed me. Clearly, this is the singularity! 🚀

yifan_zhang_'s tweet image. John Schulman just followed me. Clearly, this is the singularity! 🚀

United States Trendler

Loading...

Something went wrong.


Something went wrong.