Zaid Khan

@codezakh

@uncnlp with @mohitban47 working on automating env/data generation + program synthesis formerly @allenai @neclabsamerica

Boston, USA

zaidkhan.me

六月 2023 加入

471帖子 560关注者 890正在关注

你可能会喜欢

@alexxthiery

@olivia__white1

@0xCrispy

@saibayadon

@LordAlirezaF

@danieljvdm

@_Andros__

@Aishwarya_R_M

置顶

Zaid Khan

@codezakh

年4月15日

What if we could transform advanced math problems into abstract programs that can generate endless, verifiable problem variants? Presenting EFAGen, which automatically transforms static advanced math problems into their corresponding executable functional abstractions (EFAs).…

codezakh's tweet image. What if we could transform advanced math problems into abstract programs that can generate endless, verifiable problem variants?

Presenting EFAGen, which automatically transforms static advanced math problems into their corresponding executable functional abstractions (EFAs).…

Zaid Khan 已转帖

Shoubin Yu

@shoubin621

年10月10日

🚨 New Paper Alert! Introducing SciVideoBench — a comprehensive benchmark for scientific video reasoning! 🔬SciVideoBench: 1. Spans Physics, Chemistry, Biology & Medicine with authentic experimental videos. 2. Features 1,000 challenging MCQs across three reasoning types:…

shoubin621's tweet image. 🚨 New Paper Alert! Introducing SciVideoBench — a comprehensive benchmark for scientific video reasoning!

🔬SciVideoBench:

1. Spans Physics, Chemistry, Biology &amp; Medicine with authentic experimental videos.

2. Features 1,000 challenging MCQs across three reasoning types:…

Zaid Khan 已转帖

CODS

@ikddcods

年10月9日

We welcome Prof. Mohit Bansal (UNC Chapel Hill) as a keynote speaker at #CODS2025! Director of UNC’s MURGe-Lab, he works in multimodal generative models, reasoning agents & faithful language generation. He is an AAAI Fellow, PECASE and multiple best paper awardee.

ikddcods's tweet image. We welcome Prof. Mohit Bansal (UNC Chapel Hill) as a keynote speaker at #CODS2025!

Director of UNC’s MURGe-Lab, he works in multimodal generative models, reasoning agents &amp; faithful language generation. He is an AAAI Fellow, PECASE and multiple best paper awardee.

Zaid Khan 已转帖

Zun Wang

@ZunWang919

年10月8日

🚨 Thrilled to introduce Self-Improving Demonstrations (SID) for Goal-Oriented Vision-and-Language Navigation — a scalable paradigm where navigation agents learn to explore by teaching themselves. ➡️ Agents iteratively generate and learn from their own successful trajectories ➡️…

ZunWang919's tweet image. 🚨 Thrilled to introduce Self-Improving Demonstrations (SID) for Goal-Oriented Vision-and-Language Navigation — a scalable paradigm where navigation agents learn to explore by teaching themselves.

➡️ Agents iteratively generate and learn from their own successful trajectories
➡️…

Zaid Khan 已转帖

David Wan

@meetdavidwan

年10月7日

Thanks for the shoutout! 🇨🇦I’ll be at #COLM2025 presenting two papers: GenerationPrograms (Attribution): Poster Session 4, Oct 8th, 4:30 PM QAPyramid (Summarization Eval): Poster Session 5, Oct 9th, 11:00 AM I’m also on the industry job market for research scientist roles.…

Mohit Bansal

@mohitban47

年10月6日

🚨 Check out our awesome students/postdocs' papers at #COLM2025 and say hi to them (several are on the job market or hiring) --> -- Archiki, David are on the post-PhD job market! -- Elias finished his postdoc & is now faculty at UT-Austin CS and looking to admit PhD students!…

mohitban47's tweet image. 🚨 Check out our awesome students/postdocs' papers at #COLM2025 and say hi to them (several are on the job market or hiring) --&gt;

-- Archiki, David are on the post-PhD job market!
-- Elias finished his postdoc &amp; is now faculty at UT-Austin CS and looking to admit PhD students!…

Zaid Khan 已转帖

Huaxiu Yao

@HuaxiuYaoML

年10月7日

❗️Self-evolution is quietly pushing LLM agents off the rails. ⚠️ Even perfect alignment at deployment can gradually forget human alignment and shift toward self-serving strategies. Over time, LLM agents stop following values, imitate bad strategies, and even spread misaligned…

Siwei Han

@lillianwei423

年10月7日

🚨 Introducing ATP — Alignment Tipping Process! 🔥 Beware! Self-Evolution is gradually pushing LLM Agents off the rails! Even perfect alignment at deployment can gradually forget human alignment and shift toward self-serving strategies. #AI #LLM #Agents #SelfEvolving #Alignment…

lillianwei423's tweet image. 🚨 Introducing ATP — Alignment Tipping Process!
🔥 Beware! Self-Evolution is gradually pushing LLM Agents off the rails! Even perfect alignment at deployment can gradually forget human alignment and shift toward self-serving strategies.

#AI #LLM #Agents #SelfEvolving #Alignment…

Zaid Khan 已转帖

Archiki Prasad

@ArchikiPrasad

年10月7日

I am attending #COLM2025 🇨🇦 this week to present our work on: Unit Test Generation: 📅 Oct 8th (Wed), 4:30 PM, #79 RAG with conflicting evidence: 📅 Oct 9th (Thu), 11 AM, #71 PS: I'm on the industry job market for RS roles, so you can reach me via DM or in-person to chat! 😄

Mohit Bansal

@mohitban47

年10月6日

Zaid Khan 已转帖

Elias Stengel-Eskin

@EliasEskin

年10月7日

✈️ Arrived at #COLM2025 where I'll be helping to present the following 4 papers. I'm also recruiting multiple PhD students for my new lab at UT Austin -- happy to chat about research, PhD applications, or postdoc openings in my former postdoc lab at UNC! -- Learning to Generate…

Mohit Bansal

@mohitban47

年10月6日

Zaid Khan 已转帖

Mohit Bansal

@mohitban47

年10月7日

-- Postdoc openings info/details 👇 (flyer+links: cs.unc.edu/~mbansal/postd…) Also, PhD admissions/openings info: cs.unc.edu/~mbansal/prosp… x.com/mohitban47/sta…

Mohit Bansal

@mohitban47

2024年4月29日

🚨 We have postdoc openings at UNC 🙂 Exciting+diverse NLP/CV/ML topics**, freedom to create research agenda, competitive funding, very strong students, many collabs w/ other faculty & universities+companies, superb quality of life/weather. Please apply + help spread the word…

mohitban47's tweet image. 🚨 We have postdoc openings at UNC 🙂

Exciting+diverse NLP/CV/ML topics**, freedom to create research agenda, competitive funding, very strong students, many collabs w/ other faculty &amp; universities+companies, superb quality of life/weather. Please apply + help spread the word…

Zaid Khan 已转帖

Mohit Bansal

@mohitban47

年10月6日

(detailed links + websites + summary 🧵's of these papers attached below FYI 👇) -- Learning to Generate Unit Tests for Automated Debugging. @ArchikiPrasad @EliasEskin @cyjustinchen @codezakh arxiv.org/abs/2502.01619 x.com/ArchikiPrasad/…

Archiki Prasad

@ArchikiPrasad

年2月4日

🚨 Excited to share: "Learning to Generate Unit Tests for Automated Debugging" 🚨 which introduces ✨UTGen and UTDebug✨ for teaching LLMs to generate unit tests (UTs) and debugging code from generated tests. UTGen+UTDebug improve LLM-based code debugging by addressing 3 key…

ArchikiPrasad's tweet image. 🚨 Excited to share: "Learning to Generate Unit Tests for Automated Debugging" 🚨
which introduces ✨UTGen and UTDebug✨ for teaching LLMs to generate unit tests (UTs) and debugging code from generated tests.

UTGen+UTDebug improve LLM-based code debugging by addressing 3 key…

Zaid Khan 已转帖

Mohit Bansal

@mohitban47

年10月5日

🚨 "Think the right amount" for improving both reasoning accuracy and efficiency! --> Large reasoning models under-adapt = underthink on hard problems and overthink on easy ones --> ✨TRAAC✨ is an online RL, difficulty-adaptive, attention-based compression method that prunes…

Joykirat

@joykiratsingh

年10月3日

🚨 Excited to announce TRAAC, an online difficulty-adaptive, attention-based method that handles the tradeoff of under & overthinking in reasoning models to improve both accuracy and efficiency. Underthinking ❌: Models terminate reasoning too early on harder problems, leading…

joykiratsingh's tweet image. 🚨 Excited to announce TRAAC, an online difficulty-adaptive, attention-based method that handles the tradeoff of under &amp; overthinking in reasoning models to improve both accuracy and efficiency.

Underthinking ❌: Models terminate reasoning too early on harder problems, leading…

Zaid Khan 已转帖

Mohit Bansal

@mohitban47

年10月6日

Zaid Khan 已转帖

Justin Chih-Yao Chen

@cyjustinchen

年10月3日

Large reasoning models suffer from under-adaptiveness, which underthink on hard problems and overthink on easy ones. TRAAC addresses this by introducing ✨difficulty calibration and attention-based compression✨→ +8.4% accuracy & +36.8% efficiency! 1️⃣ TRAAC adaptively mitigates…

Joykirat

@joykiratsingh

年10月3日

Zaid Khan 已转帖

Joykirat

@joykiratsingh

年10月3日

Zaid Khan 已转帖

Hanqi Xiao

@hanqi_xiao

年9月30日

🚨Excited to announce General Correctness Models (GCM): 🔎We find no special advantage using an LLM to predict its own correctness, instead finding that LLMs benefit from learning to predict the correctness of many other models – becoming a GCM. Huge thanks to @vaidehi_patil_,…

Elias Stengel-Eskin

@EliasEskin

年9月30日

🚨 Announcing Generalized Correctness Models (GCMs) 🚨Finding that LLMs have little self knowledge about their own correctness, we train an 8B GCM to predict correctness of many models, which is more accurate than training model-specific CMs, and outperforms a larger…

EliasEskin's tweet image. 🚨 Announcing Generalized Correctness Models (GCMs) 🚨Finding that LLMs have little self knowledge about their own correctness, we train an 8B GCM to predict correctness of many models, which is more accurate than training model-specific CMs, and outperforms a larger…

Zaid Khan

@codezakh

年10月2日

Can attest that this is true 🙂 and now there are RS emulators (github.com/LostCityRS/Ser…) which could be turned into a pretty cool environment to eval agents on — there are 100+ quests, lots of bosses to defeat, a legible "tech tree" with smithing, crafting etc

codezakh's tweet card. Setup scripts for Engine + Content. Contribute to LostCityRS/Server development by creating an account on GitHub.

GitHub - LostCityRS/Server: Setup scripts for Engine + Content

来源: github.com

Alex Reibman 🖇️

@AlexReibman

年9月10日

x.com/i/article/1965…

Zaid Khan 已转帖

Elias Stengel-Eskin

@EliasEskin

年10月2日

🚨 Introducing DINCO, a zero-resource calibration method for verbalized LLM confidence. We normalize over self-generated distractors to enforce coherence ➡️ better-calibrated and less saturated (more usable) confidence! ⚠️ Problem: Standard verbalized confidence is overconfident…

EliasEskin's tweet image. 🚨 Introducing DINCO, a zero-resource calibration method for verbalized LLM confidence. We normalize over self-generated distractors to enforce coherence ➡️ better-calibrated and less saturated (more usable) confidence!

⚠️ Problem: Standard verbalized confidence is overconfident…

Zaid Khan 已转帖

Mohit Bansal

@mohitban47

年10月2日

🚨 NuRL: Nudging the Boundaries of LLM Reasoning -- GRPO improves LLM reasoning, but stays within the model's "comfort zone" i.e., hard samples (0% pass rate) remain unsolvable and contribute no meaningful gradients. -- In NuRL, we show that "nudging" the LLM with…

Justin Chih-Yao Chen

@cyjustinchen

年10月1日

🚨 NuRL: Nudging the Boundaries of LLM Reasoning GRPO improves LLM reasoning, but often within the model's "comfort zone": hard samples (w/ 0% pass rate) remain unsolvable and contribute zero learning signals. In NuRL, we show that "nudging" the LLM with self-generated hints…

cyjustinchen's tweet image. 🚨 NuRL: Nudging the Boundaries of LLM Reasoning

GRPO improves LLM reasoning, but often within the model's "comfort zone": hard samples (w/ 0% pass rate) remain unsolvable and contribute zero learning signals. In NuRL, we show that "nudging" the LLM with self-generated hints…

Zaid Khan 已转帖

Justin Chih-Yao Chen

@cyjustinchen

年10月1日

Zaid Khan 已转帖

Elias Stengel-Eskin

@EliasEskin

年9月30日

Zaid Khan 已转帖

Mohit Bansal

@mohitban47

年9月26日

Shiny new building and good views from the new VirginiaTech campus in DC 😉 -- it was a pleasure to meet everyone and engage in exciting discussions about trustworthy agents, collaborative reasoning/privacy, and controllable multimodal generation -- thanks again @profnaren,…

mohitban47's tweet image. Shiny new building and good views from the new VirginiaTech campus in DC 😉 -- it was a pleasure to meet everyone and engage in exciting discussions about trustworthy agents, collaborative reasoning/privacy, and controllable multimodal generation -- thanks again @profnaren,…

Mohit Bansal

@mohitban47

年9月23日

Looking forward to giving this Distinguished Lecture at Virginia Tech on Friday & meeting the awesome faculty+students there! 🙂 Will also discuss some of our new work in multi-agent compositional privacy risks+mitigation, ambiguity/loophole exploitation by LLMs, and bridging…