nlp_mit's profile picture. NLP Group at @MIT_CSAIL! PIs: @yoonrkim @jacobandreas @lateinteraction @pliang279 @david_sontag, Jim Glass, @roger_p_levy

MIT NLP

@nlp_mit

NLP Group at @MIT_CSAIL! PIs: @yoonrkim @jacobandreas @lateinteraction @pliang279 @david_sontag, Jim Glass, @roger_p_levy

مثبتة

Hello everyone! We are quite a bit late to the twitter party, but welcome to the MIT NLP Group account! follow along for the latest research from our labs as we dive deep into language, learning, and logic 🤖📚🧠

nlp_mit's tweet image. Hello everyone! We are quite a bit late to the twitter party, but welcome to the MIT NLP Group account! follow along for the latest research from our labs as we dive deep into language, learning, and logic 🤖📚🧠

MIT NLP أعاد

🗞️ Dialogues with AI Reduce Beliefs in Misinformation but Build No Lasting Discernment Skills ➡️ While interactions with AI have been shown to durably reduce people’s beliefs in false information, it is unclear whether these interactions also teach people the skills to discern…

anku__rani's tweet image. 🗞️ Dialogues with AI Reduce Beliefs in Misinformation but Build No Lasting Discernment Skills

➡️ While interactions with AI have been shown to durably reduce people’s
beliefs in false information, it is unclear whether these interactions also teach people the skills to discern…

catch MIT NLP at @COLM_conf day 1! morning: @gabe_grand is presenting “Self Steering Language Models” @ben_lipkin is presenting “Fast Controlled Generation from Language Models with Adaptive Language righted Rejection Sampling” @KaivuHariharan is presenting “Breakpoint:…


MIT NLP أعاد

Good morning @COLM_conf! Excited to present our poster on Self-Steering LMs (#50, 11AM-1PM). If you’re thinking about codegen, probabilistic inference, or parallel scaling, stop by for a chat!

gabe_grand's tweet image. Good morning @COLM_conf! Excited to present our poster on Self-Steering LMs (#50, 11AM-1PM). If you’re thinking about codegen, probabilistic inference, or parallel scaling, stop by for a chat!
gabe_grand's tweet image. Good morning @COLM_conf! Excited to present our poster on Self-Steering LMs (#50, 11AM-1PM). If you’re thinking about codegen, probabilistic inference, or parallel scaling, stop by for a chat!

MIT NLP أعاد

flying to 🇨🇦 this week for #COLM2025! catch us on friday to hear our talk about RLCR at the SCALR@COLM workshop. reach out to chat about test time compute, rl for interaction, and anything else!

It seems GPT‑OSS is very prone to hallucinations … check out our RLCR paper to see how we trained reasoning models to know what they don't know. Website 🌐 and code 💻 out today! rl-calibration.github.io 🚀

ishapuri101's tweet image. It seems GPT‑OSS is very prone to hallucinations … check out our RLCR paper to see how we trained reasoning models to know what they don't know. Website 🌐 and code 💻 out today!  rl-calibration.github.io 🚀


MIT NLP أعاد

I will be giving a talk on RLCR at the SCALR@COLM workshop on Friday! Come learn how LLMs can be trained to reason about their own uncertainty. Always happy to chat about RL and related ideas (DMs open)!

🚨New Paper!🚨 We trained reasoning LLMs to reason about what they don't know. o1-style reasoning training improves accuracy but produces overconfident models that hallucinate more. Meet RLCR: a simple RL method that trains LLMs to reason and reflect on their uncertainty --…

MehulDamani2's tweet image. 🚨New Paper!🚨
We trained reasoning LLMs to reason about what they don't know.

o1-style reasoning training improves accuracy but produces overconfident models that hallucinate more.

Meet RLCR: a simple RL method that trains LLMs to reason and reflect on their uncertainty --…


Exciting new work by @alexisjross @megha_byte on AI + education for code!

New preprint on AI + Education! 🍎 “Modeling Student Learning with 3.8M Program Traces” 💻 When students code, their edits tell a story about their reasoning process: exploring, debugging, and tinkering 🧠 What can LMs learn from training on student edit sequences? 📚

megha_byte's tweet image. New preprint on AI + Education! 🍎
“Modeling Student Learning with 3.8M Program Traces” 💻

When students code, their edits tell a story about their reasoning process:  exploring, debugging, and tinkering 🧠

What can LMs learn from training on student edit sequences? 📚


MIT NLP أعاد

🎉 Accepted @ EMNLP! We found surprising brittleness of SOTA RMs under minor input changes and proposed a method to improve them. Paper now updated with more results including an evaluation of GPT-4o (which displays similar brittleness) arxiv.org/abs/2503.11751

Robust reward models are critical for alignment/inference-time algos, auto eval, etc. (e.g. to prevent reward hacking which could render alignment ineffective). ⚠️ But we found that SOTA RMs are brittle 🫧 and easily flip predictions when the inputs are slightly transformed 🍃 🧵

zhaofeng_wu's tweet image. Robust reward models are critical for alignment/inference-time algos, auto eval, etc. (e.g. to prevent reward hacking which could render alignment ineffective). ⚠️ But we found that SOTA RMs are brittle 🫧 and easily flip predictions when the inputs are slightly transformed 🍃 🧵


MIT NLP أعاد

I am super excited to share our new paper “ REGen: Multimodal Retrieval-Embedded Generation for Long-to-Short Video Editing” has been accepted at #NeurIPS 2025. Paper: arxiv.org/abs/2505.18880 Demo: wx83.github.io/REGen/


MIT NLP أعاد

a bit late – but my last PhD paper was accepted as an oral to #EMNLP2025 ! w/ @zhuci19 @stats_stephen Tommi Jaakkola reasoning LMs improve by thinking for longer, but longer is not always better *thought calibration* is an inference-time strategy for efficient test-time scaling

menghua_wu's tweet image. a bit late – but my last PhD paper was accepted as an oral to #EMNLP2025 ! w/ @zhuci19 @stats_stephen Tommi Jaakkola

reasoning LMs improve by thinking for longer, but longer is not always better

*thought calibration* is an inference-time strategy for efficient test-time scaling

MIT NLP أعاد

All the recordings for the @GPU_MODE x @scaleml series are up as a playlist in case you missed it 😁 There's so much value in these ~8 hours of lectures, from proving quantization error bounds on a whiteboard to a deep-dive into GPU warp schedulers! Plz take advantage of it!

a1zhang's tweet image. All the recordings for the @GPU_MODE x @scaleml series are up as a playlist in case you missed it 😁

There's so much value in these ~8 hours of lectures, from proving quantization error bounds on a whiteboard to a deep-dive into GPU warp schedulers!

Plz take advantage of it!

MIT NLP أعاد

For agents to improve over time, they can’t afford to forget what they’ve already mastered. We found that supervised fine-tuning forgets more than RL when training on a new task! Want to find out why? 👇

jyo_pari's tweet image. For agents to improve over time, they can’t afford to forget what they’ve already mastered.

We found that supervised fine-tuning forgets more than RL when training on a new task! 

Want to find out why? 👇

MIT NLP أعاد

Since my undergraduate days at CMU, I've been participating in puzzlehunts: involving complex, multi-step puzzles, lacking well-defined problem definitions, with creative and subtle hints and esoteric world knowledge, requiring language, spatial, and sometimes even physical…

pliang279's tweet image. Since my undergraduate days at CMU, I've been participating in puzzlehunts: involving complex, multi-step puzzles, lacking well-defined problem definitions, with creative and subtle hints and esoteric world knowledge, requiring language, spatial, and sometimes even physical…

Most problems have clear-cut instructions: solve for x, find the next number, choose the right answer. Puzzlehunts don’t. They demand creativity and lateral thinking. We introduce PuzzleWorld: a new benchmark of puzzlehunt problems challenging models to think creatively.



MIT NLP أعاد

✨New work on mathematical reasoning and attribution is now on arXiv! When given charts and questions, multimodal LLMs generate answers but often lack attribution (which granular chart elements drove the answer). If it sounds interesting, please read arxiv.org/abs/2508.16850 🗞️

anku__rani's tweet image. ✨New work on mathematical reasoning and attribution is now on arXiv! When given charts and questions, multimodal LLMs generate answers but often lack attribution (which granular chart elements drove the answer).
If it sounds interesting, please read arxiv.org/abs/2508.16850 🗞️

MIT NLP أعاد

A bit late, but finally got around to posting the recorded and edited lecture videos for the **How to AI (Almost) Anything** course I taught at MIT in spring 2025. Youtube playlist: youtube.com/watch?v=0MYt0u… Course website and materials: mit-mi.github.io/how2ai-course/… Today's AI can be…

pliang279's tweet image. A bit late, but finally got around to posting the recorded and edited lecture videos for the **How to AI (Almost) Anything** course I taught at MIT in spring 2025.

Youtube playlist: youtube.com/watch?v=0MYt0u…

Course website and materials: mit-mi.github.io/how2ai-course/…

Today's AI can be…

MIT NLP أعاد

It seems GPT‑OSS is very prone to hallucinations … check out our RLCR paper to see how we trained reasoning models to know what they don't know. Website 🌐 and code 💻 out today! rl-calibration.github.io 🚀

ishapuri101's tweet image. It seems GPT‑OSS is very prone to hallucinations … check out our RLCR paper to see how we trained reasoning models to know what they don't know. Website 🌐 and code 💻 out today!  rl-calibration.github.io 🚀

MIT NLP أعاد

Scaling CLIP on English-only data is outdated now… 🌍We built CLIP data curation pipeline for 300+ languages 🇬🇧We train MetaCLIP 2 without compromising English-task performance (it actually improves! 🥳It’s time to drop the language filter! 📝arxiv.org/abs/2507.22062 [1/5] 🧵

YungSungChuang's tweet image. Scaling CLIP on English-only data is outdated now…

🌍We built CLIP data curation pipeline for 300+ languages
🇬🇧We train MetaCLIP 2 without compromising English-task performance (it actually improves!
🥳It’s time to drop the language filter!

📝arxiv.org/abs/2507.22062

[1/5]

🧵

🚨new paper alert!🚨 rl for calibration 🚀🚀🚀

fun new paper training LLMs to analyze their own uncertainty and be more calibrated in their confidence! arxiv.org/abs/2507.16806



MIT NLP أعاد

fun new paper training LLMs to analyze their own uncertainty and be more calibrated in their confidence! arxiv.org/abs/2507.16806

🚨New Paper!🚨 We trained reasoning LLMs to reason about what they don't know. o1-style reasoning training improves accuracy but produces overconfident models that hallucinate more. Meet RLCR: a simple RL method that trains LLMs to reason and reflect on their uncertainty --…

MehulDamani2's tweet image. 🚨New Paper!🚨
We trained reasoning LLMs to reason about what they don't know.

o1-style reasoning training improves accuracy but produces overconfident models that hallucinate more.

Meet RLCR: a simple RL method that trains LLMs to reason and reflect on their uncertainty --…


Check out this new paper training LLMs to analyze their own uncertainty and be more calibrated! from @MehulDamani2 @ishapuri101 @StewartSlocum1 @IdanShenfeld and co!

🚨New Paper!🚨 We trained reasoning LLMs to reason about what they don't know. o1-style reasoning training improves accuracy but produces overconfident models that hallucinate more. Meet RLCR: a simple RL method that trains LLMs to reason and reflect on their uncertainty --…

MehulDamani2's tweet image. 🚨New Paper!🚨
We trained reasoning LLMs to reason about what they don't know.

o1-style reasoning training improves accuracy but produces overconfident models that hallucinate more.

Meet RLCR: a simple RL method that trains LLMs to reason and reflect on their uncertainty --…


MIT NLP أعاد

I'm currently in Vancouver for #ICML2025 this week and will present our work, "Understanding the Emergence of Multimodal Representation Alignment" later today at 4:30pm. Come by to chat!


United States الاتجاهات

Loading...

Something went wrong.


Something went wrong.