Jiarui Yao

@ExplainMiracles

UIUC CS PhD, 24

Joined May 2023

24Posts 95Followers 544Following

Jiarui Yao

@ExplainMiracles

Nov 8

Thrilled to share our paper MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning (arxiv.org/pdf/2505.24846) won an EMNLP 2025 Outstanding Paper Award! 🎉🎉 Huge congrats to the team @evangelinejy99 @RuiYang70669025 @YifanSun99 @FengLuo895614…

ExplainMiracles's tweet image. Thrilled to share our paper MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning (arxiv.org/pdf/2505.24846) won an EMNLP 2025 Outstanding Paper Award! 🎉🎉
Huge congrats to the team @evangelinejy99 @RuiYang70669025 @YifanSun99 @FengLuo895614…

Jiarui Yao reposted

Rui Yang

@RuiYang70669025

Nov 7

Thrilled to share our paper (arxiv.org/pdf/2505.24846) won an EMNLP 2025 Outstanding Paper Award! 🎉🎉 Huge congrats to the team @evangelinejy99 @ExplainMiracles @YifanSun99 @FengLuo895614 @rui4research, and big thanks to our advisors Prof. Tong Zhang and @hanzhao_ml!

RuiYang70669025's tweet image. Thrilled to share our paper (arxiv.org/pdf/2505.24846) won an EMNLP 2025 Outstanding Paper Award! 🎉🎉
Huge congrats to the team @evangelinejy99 @ExplainMiracles @YifanSun99 @FengLuo895614 @rui4research, and big thanks to our advisors Prof. Tong Zhang and @hanzhao_ml!

Jiarui Yao

@ExplainMiracles

Nov 5

I am at EMNLP 2025 HPC-AI! #emnlp2025 #hpcai

Jiarui Yao

@ExplainMiracles

Sep 19

Glad that our paper has been accepted to Neurips 2025! By gradient variance minimization (GVM), we balance the training data by difficulties and their contribution to the model. We achieve improvement on math reasoning. Please check the original post for more details.

Jiarui Yao

@ExplainMiracles

May 6

We introduce Gradient Variance Minimization (GVM)-RAFT, a principled dynamic sampling strategy that minimizes gradient variance to improve the efficiency of chain-of-thought (CoT) training in LLMs. – Achieves 2–4× faster convergence than RAFT – Improves accuracy on math…

ExplainMiracles's tweet image. We introduce Gradient Variance Minimization (GVM)-RAFT, a principled dynamic sampling strategy that minimizes gradient variance to improve the efficiency of chain-of-thought (CoT) training in LLMs.

– Achieves 2–4× faster convergence than RAFT
– Improves accuracy on math…

Jiarui Yao reposted

SyntaxMakerQQQ

@peixuanhakhan

Sep 10

(1/5) Super excited to release our new paper on Reinforcement Learning: "Self-Aligned Reward: Towards Effective and Efficient Reasoners"! Preprint: arxiv.org/pdf/2509.05489

peixuanhakhan's tweet image. (1/5) Super excited to release our new paper on Reinforcement Learning:

"Self-Aligned Reward: Towards Effective and Efficient Reasoners"!

Preprint: arxiv.org/pdf/2509.05489

Jiarui Yao reposted

Cheng Qian @ EMNLP2025

@qiancheng1231

Aug 1

🤝 Can LLM agents really understand us? We introduce UserBench: a user-centric gym environment for benchmarking how well agents align with nuanced human intent, not just follow commands. 📄 arxiv.org/pdf/2507.22034 💻 github.com/SalesforceAIRe…

qiancheng1231's tweet image. 🤝 Can LLM agents really understand us?

We introduce UserBench: a user-centric gym environment for benchmarking how well agents align with nuanced human intent, not just follow commands.

📄 arxiv.org/pdf/2507.22034
💻 github.com/SalesforceAIRe…

Jiarui Yao reposted

Yong Lin

@Yong18850571

Jul 15

(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B…

Yong18850571's tweet image. (1/4)🚨 Introducing Goedel-Prover V2 🚨
🔥🔥🔥 The strongest open-source theorem prover to date.
🥇 #1 on PutnamBench: Solves 64 problems—with far less compute.
🧠 New SOTA on MiniF2F:
* 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%.
* 8B &gt; 671B: Our 8B…

Jiarui Yao reposted

Noam Razin

@noamrazin

Jul 11

Reward models (RMs) are key to language model post-training and inference pipelines. But, little is known about the relative pros and cons of different RM types. 📰 We investigate why RMs implicitly defined by language models (LMs) often generalize worse than explicit RMs 🧵 1/6

noamrazin's tweet image. Reward models (RMs) are key to language model post-training and inference pipelines. But, little is known about the relative pros and cons of different RM types.

📰 We investigate why RMs implicitly defined by language models (LMs) often generalize worse than explicit RMs
🧵
1/6

Jiarui Yao reposted

Shulin Tian

@shulin_tian

Jun 17

🎥 Video is already a tough modality for reasoning. Egocentric video? Even tougher! It is longer, messier, and harder. 💡 How do we tackle these extremely long, information-dense sequences without exhausting GPU memory or hitting API limits? We introduce 👓Ego-R1: A framework…

Jiarui Yao reposted

Xiusi Chen

@xiusi_chen

Jun 4

Can LLMs make rational decisions like human experts? 📖Introducing DecisionFlow: Advancing Large Language Model as Principled Decision Maker We introduce a novel framework that constructs a semantically grounded decision space to evaluate trade-offs in hard decision-making…

xiusi_chen's tweet image. Can LLMs make rational decisions like human experts?

📖Introducing DecisionFlow: Advancing Large Language Model as Principled Decision Maker

We introduce a novel framework that constructs a semantically grounded decision space to evaluate trade-offs in hard decision-making…

Jiarui Yao reposted

SyntaxMakerQQQ

@peixuanhakhan

May 30

(1/5) Want to make your LLM a skilled persuader? Check out our latest paper: "ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind"! For details: 📄Arxiv: arxiv.org/pdf/2505.22961 🛠️GitHub: github.com/ulab-uiuc/ToMAP

peixuanhakhan's tweet image. (1/5) Want to make your LLM a skilled persuader?

Check out our latest paper: "ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind"!

For details:
📄Arxiv: arxiv.org/pdf/2505.22961
🛠️GitHub: github.com/ulab-uiuc/ToMAP

Jiarui Yao reposted

Cheng Qian @ EMNLP2025

@qiancheng1231

May 27

📢 New Paper Drop: From Solving to Modeling! LLMs can solve math problems — but can they model the real world? 🌍 📄 arXiv: arxiv.org/pdf/2505.15068 💻 Code: github.com/qiancheng0/Mod… Introducing ModelingAgent, a breakthrough system for real-world mathematical modeling with LLMs.

qiancheng1231's tweet image. 📢 New Paper Drop: From Solving to Modeling!
LLMs can solve math problems — but can they model the real world? 🌍

📄 arXiv: arxiv.org/pdf/2505.15068
💻 Code: github.com/qiancheng0/Mod…

Introducing ModelingAgent, a breakthrough system for real-world mathematical modeling with LLMs.

Jiarui Yao reposted

Hanze Dong

@hendrydong

May 9

How to improve the test-time scalability? - Separate thinking & solution phases to control performance under budget constraint - Budget-Constrained Rollout + GRPO - Outperforms baselines on math/code. - Cuts token 30% usage without hurting performance huggingface.co/papers/2505.05…

Paper page - Scalable Chain of Thoughts via Elastic Reasoning

Source: huggingface.co

Jiarui Yao reposted

Xiusi Chen

@xiusi_chen

May 6

🚀 Can we cast reward modeling as a reasoning task? 📖 Introducing our new paper: RM-R1: Reward Modeling as Reasoning 📑 Paper: arxiv.org/pdf/2505.02387 💻 Code: github.com/RM-R1-UIUC/RM-… Inspired by recent advances of long chain-of-thought (CoT) on reasoning-intensive tasks, we…

xiusi_chen's tweet image. 🚀 Can we cast reward modeling as a reasoning task?

📖 Introducing our new paper:
RM-R1: Reward Modeling as Reasoning

📑 Paper: arxiv.org/pdf/2505.02387
💻 Code: github.com/RM-R1-UIUC/RM-…

Inspired by recent advances of long chain-of-thought (CoT) on reasoning-intensive tasks, we…

Jiarui Yao

@ExplainMiracles

May 6

Jiarui Yao reposted

Haocheng Xi

@HaochengXiUCB

May 5

Thrilled to announce that our paper Sparse VideoGen got into #ICML2025! 🎉 Our new approach to speedup Video Generation by 2×. Details in the thread/paper. Huge thanks to my collaborators! Blog: svg-project.github.io Paper: arxiv.org/abs/2502.01776 Code:…

Haocheng Xi

@HaochengXiUCB

Mar 12

🚀 Introducing #SparseVideoGen: 2x speedup in video generation with HunyuanVideo with high pixel-level fidelity (PSNR = 29)! No training is required, no perceptible difference to the human eye! Blog: svg-project.github.io Paper: arxiv.org/abs/2502.01776 Code:…

Jiarui Yao reposted

Manling Li

@ManlingLi_

May 3

Welcome to join our Tutorial on Foundation Models Meet Embodied Agents, with @YunzhuLiYZ @maojiayuan @wenlong_huang ! Website: …models-meet-embodied-agents.github.io

ManlingLi_'s tweet image. Welcome to join our Tutorial on Foundation Models Meet Embodied Agents, with @YunzhuLiYZ @maojiayuan @wenlong_huang !

Website: …models-meet-embodied-agents.github.io

Jiarui Yao reposted

Shizhe Diao

@shizhediao

Apr 18

Thrilled to share my first project at NVIDIA! ✨ Today’s language models are pre-trained on vast and chaotic Internet texts, but these texts are unstructured and poorly understood. We propose CLIMB — Clustering-based Iterative Data Mixture Bootstrapping — a fully automated…

shizhediao's tweet image. Thrilled to share my first project at NVIDIA! ✨

Today’s language models are pre-trained on vast and chaotic Internet texts, but these texts are unstructured and poorly understood. We propose CLIMB — Clustering-based Iterative Data Mixture Bootstrapping — a fully automated…

Jiarui Yao

@ExplainMiracles

Apr 17

Negative samples are "not that important", while removing samples with all negative outputs is "important". 🤣

Hanze Dong

@hendrydong

Apr 16

🤖What makes GRPO work? Rejection Sampling→Reinforce→GRPO - RS is underrated - Key of GRPO: implicitly remove prompts without correct answer - Reinforce+Filtering > GRPO (better KL) 💻github.com/RLHFlow/Minima… 📄arxiv.org/abs/2504.11343 👀RAFT was invited to ICLR25! Come & Chat☕️