Adithya Bhaskar

@AdithyaNLP

Third year CS PhD candidate at Princeton University (@princeton_nlp @PrincetonPLI), previously CS undergrad at IIT Bombay

Princeton, NJ

adithyabh.github.io

Tham gia vào Tháng 6 2023

68Bài đăng 439Người theo dõi 402Đang theo dõi

Ghim

Adithya Bhaskar

@AdithyaNLP

25 thg 9

Language models that think, chat better. We used longCoT (w/ reward model) for RLHF instead of math, and it just works. Llama-3.1-8B-Instruct + 14K ex beats GPT-4o (!) on chat & creative writing, & even Claude-3.7-Sonnet (thinking) on AlpacaEval2 and WildBench! Read on. 🧵 1/8

AdithyaNLP's tweet image. Language models that think, chat better.

We used longCoT (w/ reward model) for RLHF instead of math, and it just works. Llama-3.1-8B-Instruct + 14K ex beats GPT-4o (!) on chat &amp; creative writing, &amp; even Claude-3.7-Sonnet (thinking) on AlpacaEval2 and WildBench!

Read on. 🧵

1/8

Adithya Bhaskar đã đăng lại

William Yang

@YangWilliam_

31 thg 10

Text-to-image (T2I) models can generate rich supervision for visual learning but generating subtle distinctions still remains challenging. Fine-tuning helps, but too much tuning → overfitting and loss of diversity. How do we preserve fidelity without sacrificing diversity (1/8)

Adithya Bhaskar đã đăng lại

Yinghui He

@yinghui_he_

20 thg 10

Claude Skills shows performance benefits from leveraging LLM skill catalogs at inference time. Our previous work (linked under thread 5/5) showed the same 6 months ago! 🌟Our new work, STAT, shows that leveraging skills during training can greatly help too‼️, e.g., Qwen can…

yinghui_he_'s tweet image. Claude Skills shows performance benefits from leveraging LLM skill catalogs at inference time. Our previous work (linked under thread 5/5) showed the same 6 months ago! 🌟Our new work, STAT, shows that leveraging skills during training can greatly help too‼️, e.g., Qwen can…

Adithya Bhaskar đã đăng lại

Xi Ye

@xiye_nlp

29 thg 9

Check out our new work on making reasoning models think broadly! 🤔 We find a minimalist, surprisingly effective recipe to THINK for CHAT: RLVR + a strong reward model, trained on real-world prompts. This project was fun and surprised me in a few ways 👇 📌 We can run RL…

Adithya Bhaskar

@AdithyaNLP

25 thg 9

Adithya Bhaskar

@AdithyaNLP

29 thg 9

Thanks for tweeting our paper!! 😁

Rohan Paul

@rohanpaul_ai

29 thg 9

The paper shows that making models think before answering makes them chat better. It introduces reinforcement learning with model rewarded thinking, RLMT, which makes the model write a private plan, then the final reply. A separate reward model, trained from human choices,…

rohanpaul_ai's tweet image. The paper shows that making models think before answering makes them chat better.

It introduces reinforcement learning with model rewarded thinking, RLMT, which makes the model write a private plan, then the final reply.

A separate reward model, trained from human choices,…

Adithya Bhaskar

@AdithyaNLP

29 thg 9

Honored to be included in the list, thanks a lot!

DAIR.AI

@dair_ai

28 thg 9

7. Language Models that Think, Chat Better A simple recipe, RL with Model-rewarded Thinking, makes small open models “plan first, answer second” on regular chat prompts and trains them with online RL against a preference reward. x.com/omarsar0/statu…

Adithya Bhaskar đã đăng lại

DAIR.AI

@dair_ai

28 thg 9

Top AI Papers of The Week (September 22-28): - ATOKEN - LLM-JEPA - Code World Model - Teaching LLMs to Plan - Agents Research Environments - Language Models that Think, Chat Better - Embodied AI: From LLMs to World Models Read on for more:

Adithya Bhaskar

@AdithyaNLP

29 thg 9

Thanks for your kind words!

Manish Kulariya

@MKulria

28 thg 9

Ever wonder why some AI chats feel robotic while others nail it? This new paper introduces a game-changer: Language Models that Think, Chat Better. They train AIs to "think" step-by-step before replying, crushing benchmarks. Mind blown? Let's dive in 👇

MKulria's tweet image. Ever wonder why some AI chats feel robotic while others nail it?

This new paper introduces a game-changer: Language Models that Think, Chat Better.

They train AIs to "think" step-by-step before replying, crushing benchmarks. Mind blown? Let's dive in 👇

Adithya Bhaskar

@AdithyaNLP

26 thg 9

Thanks a lot for the shout-out! 😁

elvis

@omarsar0

25 thg 9

Language Models that Think and Chat Better Proposes a simple RL recipe to improve small open models (eg, 8B) that rivals GPT-4o and Claude 3.7 Sonnet (thinking). Pay attention to this one, AI devs! Here are my notes:

omarsar0's tweet image. Language Models that Think and Chat Better

Proposes a simple RL recipe to improve small open models (eg, 8B) that rivals GPT-4o and Claude 3.7 Sonnet (thinking).

Pay attention to this one, AI devs!

Here are my notes:

Adithya Bhaskar

@AdithyaNLP

25 thg 9

Thanks a lot for the tweet! We had a lot of fun working on this project! 😄

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

25 thg 9

Language Models that Think, Chat Better "This paper shows that the RLVR paradigm is effective beyond verifiable domains, and introduces RL with Model-rewarded Thinking (RLMT) for general-purpose chat capabilities." "RLMT consistently outperforms standard RLHF pipelines. This…

iScienceLuvr's tweet image. Language Models that Think, Chat Better

"This paper shows that the RLVR paradigm is effective beyond verifiable domains, and introduces RL with Model-rewarded Thinking (RLMT) for general-purpose chat capabilities."

"RLMT consistently outperforms standard RLHF pipelines. This…

Adithya Bhaskar đã đăng lại

CLS

@ChengleiSi

30 thg 6

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.