MIT NLP

@nlp_mit

NLP Group at @MIT_CSAIL! PIs: @yoonrkim @jacobandreas @lateinteraction @pliang279 @david_sontag, Jim Glass, @roger_p_levy

Cambridge, MA

انضم في مارس 2025

74المنشورات 4ألفالمتابعون 54المتابَعون

مثبتة

MIT NLP

@nlp_mit

٢٧ مارسم

Hello everyone! We are quite a bit late to the twitter party, but welcome to the MIT NLP Group account! follow along for the latest research from our labs as we dive deep into language, learning, and logic 🤖📚🧠

nlp_mit's tweet image. Hello everyone! We are quite a bit late to the twitter party, but welcome to the MIT NLP Group account! follow along for the latest research from our labs as we dive deep into language, learning, and logic 🤖📚🧠

MIT NLP أعاد

Anku Rani

@anku__rani

٥ أكتوبرم

🗞️ Dialogues with AI Reduce Beliefs in Misinformation but Build No Lasting Discernment Skills ➡️ While interactions with AI have been shown to durably reduce people’s beliefs in false information, it is unclear whether these interactions also teach people the skills to discern…

anku__rani's tweet image. 🗞️ Dialogues with AI Reduce Beliefs in Misinformation but Build No Lasting Discernment Skills

➡️ While interactions with AI have been shown to durably reduce people’s
beliefs in false information, it is unclear whether these interactions also teach people the skills to discern…

MIT NLP

@nlp_mit

٧ أكتوبرم

catch MIT NLP at @COLM_conf day 1! morning: @gabe_grand is presenting “Self Steering Language Models” @ben_lipkin is presenting “Fast Controlled Generation from Language Models with Adaptive Language righted Rejection Sampling” @KaivuHariharan is presenting “Breakpoint:…

MIT NLP أعاد

Ġabe Ġrand @ COLM

@gabe_grand

٧ أكتوبرم

Good morning @COLM_conf! Excited to present our poster on Self-Steering LMs (#50, 11AM-1PM). If you’re thinking about codegen, probabilistic inference, or parallel scaling, stop by for a chat!

gabe_grand's tweet image. Good morning @COLM_conf! Excited to present our poster on Self-Steering LMs (#50, 11AM-1PM). If you’re thinking about codegen, probabilistic inference, or parallel scaling, stop by for a chat!

MIT NLP أعاد

Isha Puri @COLM

@ishapuri101

٧ أكتوبرم

flying to 🇨🇦 this week for #COLM2025! catch us on friday to hear our talk about RLCR at the SCALR@COLM workshop. reach out to chat about test time compute, rl for interaction, and anything else!

Isha Puri @COLM

@ishapuri101

٦ أغسطسم

It seems GPT‑OSS is very prone to hallucinations … check out our RLCR paper to see how we trained reasoning models to know what they don't know. Website 🌐 and code 💻 out today! rl-calibration.github.io 🚀

ishapuri101's tweet image. It seems GPT‑OSS is very prone to hallucinations … check out our RLCR paper to see how we trained reasoning models to know what they don't know. Website 🌐 and code 💻 out today! rl-calibration.github.io 🚀

MIT NLP أعاد

Mehul Damani

@MehulDamani2

٦ أكتوبرم

I will be giving a talk on RLCR at the SCALR@COLM workshop on Friday! Come learn how LLMs can be trained to reason about their own uncertainty. Always happy to chat about RL and related ideas (DMs open)!

Mehul Damani

@MehulDamani2

٢٣ يوليوم

🚨New Paper!🚨 We trained reasoning LLMs to reason about what they don't know. o1-style reasoning training improves accuracy but produces overconfident models that hallucinate more. Meet RLCR: a simple RL method that trains LLMs to reason and reflect on their uncertainty --…

MehulDamani2's tweet image. 🚨New Paper!🚨
We trained reasoning LLMs to reason about what they don't know.

o1-style reasoning training improves accuracy but produces overconfident models that hallucinate more.

Meet RLCR: a simple RL method that trains LLMs to reason and reflect on their uncertainty --…

MIT NLP

@nlp_mit

٧ أكتوبرم

Exciting new work by @alexisjross @megha_byte on AI + education for code!

Megha Srivastava

@megha_byte

٧ أكتوبرم

New preprint on AI + Education! 🍎 “Modeling Student Learning with 3.8M Program Traces” 💻 When students code, their edits tell a story about their reasoning process: exploring, debugging, and tinkering 🧠 What can LMs learn from training on student edit sequences? 📚

megha_byte's tweet image. New preprint on AI + Education! 🍎
“Modeling Student Learning with 3.8M Program Traces” 💻

When students code, their edits tell a story about their reasoning process: exploring, debugging, and tinkering 🧠

What can LMs learn from training on student edit sequences? 📚

MIT NLP أعاد

Zhaofeng Wu

@zhaofeng_wu

٢٢ سبتمبرم

🎉 Accepted @ EMNLP! We found surprising brittleness of SOTA RMs under minor input changes and proposed a method to improve them. Paper now updated with more results including an evaluation of GPT-4o (which displays similar brittleness) arxiv.org/abs/2503.11751

Zhaofeng Wu

@zhaofeng_wu

١٨ مارسم

Robust reward models are critical for alignment/inference-time algos, auto eval, etc. (e.g. to prevent reward hacking which could render alignment ineffective). ⚠️ But we found that SOTA RMs are brittle 🫧 and easily flip predictions when the inputs are slightly transformed 🍃 🧵

zhaofeng_wu's tweet image. Robust reward models are critical for alignment/inference-time algos, auto eval, etc. (e.g. to prevent reward hacking which could render alignment ineffective). ⚠️ But we found that SOTA RMs are brittle 🫧 and easily flip predictions when the inputs are slightly transformed 🍃 🧵

MIT NLP أعاد

Weihan Xu

@WeihanCHsu

١٩ سبتمبرم

I am super excited to share our new paper “ REGen: Multimodal Retrieval-Embedded Generation for Long-to-Short Video Editing” has been accepted at #NeurIPS 2025. Paper: arxiv.org/abs/2505.18880 Demo: wx83.github.io/REGen/

MIT NLP أعاد

Rachel (Menghua) Wu

@menghua_wu

١٧ سبتمبرم

a bit late – but my last PhD paper was accepted as an oral to #EMNLP2025 ! w/ @zhuci19 @stats_stephen Tommi Jaakkola reasoning LMs improve by thinking for longer, but longer is not always better *thought calibration* is an inference-time strategy for efficient test-time scaling

menghua_wu's tweet image. a bit late – but my last PhD paper was accepted as an oral to #EMNLP2025 ! w/ @zhuci19 @stats_stephen Tommi Jaakkola

reasoning LMs improve by thinking for longer, but longer is not always better

*thought calibration* is an inference-time strategy for efficient test-time scaling

MIT NLP أعاد

Alex Zhang

@a1zhang

١ سبتمبرم

All the recordings for the @GPU_MODE x @scaleml series are up as a playlist in case you missed it 😁 There's so much value in these ~8 hours of lectures, from proving quantization error bounds on a whiteboard to a deep-dive into GPU warp schedulers! Plz take advantage of it!

a1zhang's tweet image. All the recordings for the @GPU_MODE x @scaleml series are up as a playlist in case you missed it 😁

There's so much value in these ~8 hours of lectures, from proving quantization error bounds on a whiteboard to a deep-dive into GPU warp schedulers!

Plz take advantage of it!

MIT NLP أعاد

Jyo Pari

@jyo_pari

٥ سبتمبرم

For agents to improve over time, they can’t afford to forget what they’ve already mastered. We found that supervised fine-tuning forgets more than RL when training on a new task! Want to find out why? 👇

jyo_pari's tweet image. For agents to improve over time, they can’t afford to forget what they’ve already mastered.

We found that supervised fine-tuning forgets more than RL when training on a new task!

Want to find out why? 👇

MIT NLP أعاد

Paul Liang

@pliang279

٣ سبتمبرم

Since my undergraduate days at CMU, I've been participating in puzzlehunts: involving complex, multi-step puzzles, lacking well-defined problem definitions, with creative and subtle hints and esoteric world knowledge, requiring language, spatial, and sometimes even physical…

pliang279's tweet image. Since my undergraduate days at CMU, I've been participating in puzzlehunts: involving complex, multi-step puzzles, lacking well-defined problem definitions, with creative and subtle hints and esoteric world knowledge, requiring language, spatial, and sometimes even physical…

Megan Tjandrasuwita

@mmtjandrasuwita

٢١ يونيوم

Most problems have clear-cut instructions: solve for x, find the next number, choose the right answer. Puzzlehunts don’t. They demand creativity and lateral thinking. We introduce PuzzleWorld: a new benchmark of puzzlehunt problems challenging models to think creatively.

MIT NLP أعاد

Anku Rani

@anku__rani

٢٧ أغسطسم

✨New work on mathematical reasoning and attribution is now on arXiv! When given charts and questions, multimodal LLMs generate answers but often lack attribution (which granular chart elements drove the answer). If it sounds interesting, please read arxiv.org/abs/2508.16850 🗞️

anku__rani's tweet image. ✨New work on mathematical reasoning and attribution is now on arXiv! When given charts and questions, multimodal LLMs generate answers but often lack attribution (which granular chart elements drove the answer).
If it sounds interesting, please read arxiv.org/abs/2508.16850 🗞️

MIT NLP أعاد

Paul Liang

@pliang279

٢٧ أغسطسم

A bit late, but finally got around to posting the recorded and edited lecture videos for the **How to AI (Almost) Anything** course I taught at MIT in spring 2025. Youtube playlist: youtube.com/watch?v=0MYt0u… Course website and materials: mit-mi.github.io/how2ai-course/… Today's AI can be…

pliang279's tweet image. A bit late, but finally got around to posting the recorded and edited lecture videos for the **How to AI (Almost) Anything** course I taught at MIT in spring 2025.

Youtube playlist: youtube.com/watch?v=0MYt0u…

Course website and materials: mit-mi.github.io/how2ai-course/…

Today's AI can be…

MIT NLP أعاد

Isha Puri @COLM

@ishapuri101

٦ أغسطسم

MIT NLP أعاد

Yung-Sung Chuang

@YungSungChuang

٣٠ يوليوم

Scaling CLIP on English-only data is outdated now… 🌍We built CLIP data curation pipeline for 300+ languages 🇬🇧We train MetaCLIP 2 without compromising English-task performance (it actually improves! 🥳It’s time to drop the language filter! 📝arxiv.org/abs/2507.22062 [1/5] 🧵

YungSungChuang's tweet image. Scaling CLIP on English-only data is outdated now…

🌍We built CLIP data curation pipeline for 300+ languages
🇬🇧We train MetaCLIP 2 without compromising English-task performance (it actually improves!
🥳It’s time to drop the language filter!

📝arxiv.org/abs/2507.22062

[1/5]

🧵

MIT NLP

@nlp_mit

٢٣ يوليوم

🚨new paper alert!🚨 rl for calibration 🚀🚀🚀

Isha Puri @COLM

@ishapuri101

٢٣ يوليوم

fun new paper training LLMs to analyze their own uncertainty and be more calibrated in their confidence! arxiv.org/abs/2507.16806

MIT NLP أعاد

Isha Puri @COLM

@ishapuri101

٢٣ يوليوم

fun new paper training LLMs to analyze their own uncertainty and be more calibrated in their confidence! arxiv.org/abs/2507.16806

Mehul Damani

@MehulDamani2

٢٣ يوليوم

MIT NLP

@nlp_mit

٢٣ يوليوم

Check out this new paper training LLMs to analyze their own uncertainty and be more calibrated! from @MehulDamani2 @ishapuri101 @StewartSlocum1 @IdanShenfeld and co!

Mehul Damani

@MehulDamani2

٢٣ يوليوم

MIT NLP أعاد

Megan Tjandrasuwita

@mmtjandrasuwita

١٦ يوليوم

I'm currently in Vancouver for #ICML2025 this week and will present our work, "Understanding the Emergence of Multimodal Representation Alignment" later today at 4:30pm. Come by to chat!