Zaid Khan

@codezakh

@uncnlp with @mohitban47 working on automating env/data generation + program synthesis formerly @allenai @neclabsamerica

Boston, USA

zaidkhan.me

Tham gia vào Tháng 6 2023

471Bài đăng 560Người theo dõi 890Đang theo dõi

Bạn có thể thích

@alexxthiery

@olivia__white1

@0xCrispy

@saibayadon

@LordAlirezaF

@danieljvdm

@_Andros__

@Aishwarya_R_M

Ghim

Zaid Khan

@codezakh

15 thg 4

What if we could transform advanced math problems into abstract programs that can generate endless, verifiable problem variants? Presenting EFAGen, which automatically transforms static advanced math problems into their corresponding executable functional abstractions (EFAs).…

codezakh's tweet image. What if we could transform advanced math problems into abstract programs that can generate endless, verifiable problem variants?

Presenting EFAGen, which automatically transforms static advanced math problems into their corresponding executable functional abstractions (EFAs).…

Zaid Khan đã đăng lại

Shoubin Yu

@shoubin621

10 thg 10

🚨 New Paper Alert! Introducing SciVideoBench — a comprehensive benchmark for scientific video reasoning! 🔬SciVideoBench: 1. Spans Physics, Chemistry, Biology & Medicine with authentic experimental videos. 2. Features 1,000 challenging MCQs across three reasoning types:…

shoubin621's tweet image. 🚨 New Paper Alert! Introducing SciVideoBench — a comprehensive benchmark for scientific video reasoning!

🔬SciVideoBench:

1. Spans Physics, Chemistry, Biology &amp; Medicine with authentic experimental videos.

2. Features 1,000 challenging MCQs across three reasoning types:…

Zaid Khan đã đăng lại

CODS

@ikddcods

9 thg 10

We welcome Prof. Mohit Bansal (UNC Chapel Hill) as a keynote speaker at #CODS2025! Director of UNC’s MURGe-Lab, he works in multimodal generative models, reasoning agents & faithful language generation. He is an AAAI Fellow, PECASE and multiple best paper awardee.

ikddcods's tweet image. We welcome Prof. Mohit Bansal (UNC Chapel Hill) as a keynote speaker at #CODS2025!

Director of UNC’s MURGe-Lab, he works in multimodal generative models, reasoning agents &amp; faithful language generation. He is an AAAI Fellow, PECASE and multiple best paper awardee.

Zaid Khan đã đăng lại

Zun Wang

@ZunWang919

8 thg 10

🚨 Thrilled to introduce Self-Improving Demonstrations (SID) for Goal-Oriented Vision-and-Language Navigation — a scalable paradigm where navigation agents learn to explore by teaching themselves. ➡️ Agents iteratively generate and learn from their own successful trajectories ➡️…

ZunWang919's tweet image. 🚨 Thrilled to introduce Self-Improving Demonstrations (SID) for Goal-Oriented Vision-and-Language Navigation — a scalable paradigm where navigation agents learn to explore by teaching themselves.

➡️ Agents iteratively generate and learn from their own successful trajectories
➡️…

Zaid Khan đã đăng lại

David Wan

@meetdavidwan

7 thg 10

Thanks for the shoutout! 🇨🇦I’ll be at #COLM2025 presenting two papers: GenerationPrograms (Attribution): Poster Session 4, Oct 8th, 4:30 PM QAPyramid (Summarization Eval): Poster Session 5, Oct 9th, 11:00 AM I’m also on the industry job market for research scientist roles.…

Mohit Bansal

@mohitban47

6 thg 10

🚨 Check out our awesome students/postdocs' papers at #COLM2025 and say hi to them (several are on the job market or hiring) --> -- Archiki, David are on the post-PhD job market! -- Elias finished his postdoc & is now faculty at UT-Austin CS and looking to admit PhD students!…

mohitban47's tweet image. 🚨 Check out our awesome students/postdocs' papers at #COLM2025 and say hi to them (several are on the job market or hiring) --&gt;

-- Archiki, David are on the post-PhD job market!
-- Elias finished his postdoc &amp; is now faculty at UT-Austin CS and looking to admit PhD students!…

Zaid Khan đã đăng lại

Huaxiu Yao

@HuaxiuYaoML

7 thg 10

❗️Self-evolution is quietly pushing LLM agents off the rails. ⚠️ Even perfect alignment at deployment can gradually forget human alignment and shift toward self-serving strategies. Over time, LLM agents stop following values, imitate bad strategies, and even spread misaligned…

Siwei Han

@lillianwei423

7 thg 10

🚨 Introducing ATP — Alignment Tipping Process! 🔥 Beware! Self-Evolution is gradually pushing LLM Agents off the rails! Even perfect alignment at deployment can gradually forget human alignment and shift toward self-serving strategies. #AI #LLM #Agents #SelfEvolving #Alignment…

lillianwei423's tweet image. 🚨 Introducing ATP — Alignment Tipping Process!
🔥 Beware! Self-Evolution is gradually pushing LLM Agents off the rails! Even perfect alignment at deployment can gradually forget human alignment and shift toward self-serving strategies.

#AI #LLM #Agents #SelfEvolving #Alignment…

Zaid Khan đã đăng lại

Archiki Prasad

@ArchikiPrasad

7 thg 10

I am attending #COLM2025 🇨🇦 this week to present our work on: Unit Test Generation: 📅 Oct 8th (Wed), 4:30 PM, #79 RAG with conflicting evidence: 📅 Oct 9th (Thu), 11 AM, #71 PS: I'm on the industry job market for RS roles, so you can reach me via DM or in-person to chat! 😄

Mohit Bansal

@mohitban47

6 thg 10

Zaid Khan đã đăng lại

Elias Stengel-Eskin

@EliasEskin

7 thg 10

✈️ Arrived at #COLM2025 where I'll be helping to present the following 4 papers. I'm also recruiting multiple PhD students for my new lab at UT Austin -- happy to chat about research, PhD applications, or postdoc openings in my former postdoc lab at UNC! -- Learning to Generate…

Mohit Bansal

@mohitban47

6 thg 10

Zaid Khan đã đăng lại

Mohit Bansal

@mohitban47

7 thg 10

-- Postdoc openings info/details 👇 (flyer+links: cs.unc.edu/~mbansal/postd…) Also, PhD admissions/openings info: cs.unc.edu/~mbansal/prosp… x.com/mohitban47/sta…

Mohit Bansal

@mohitban47

29 thg 4, 2024

🚨 We have postdoc openings at UNC 🙂 Exciting+diverse NLP/CV/ML topics**, freedom to create research agenda, competitive funding, very strong students, many collabs w/ other faculty & universities+companies, superb quality of life/weather. Please apply + help spread the word…

mohitban47's tweet image. 🚨 We have postdoc openings at UNC 🙂

Exciting+diverse NLP/CV/ML topics**, freedom to create research agenda, competitive funding, very strong students, many collabs w/ other faculty &amp; universities+companies, superb quality of life/weather. Please apply + help spread the word…

Zaid Khan đã đăng lại

Mohit Bansal

@mohitban47

6 thg 10

(detailed links + websites + summary 🧵's of these papers attached below FYI 👇) -- Learning to Generate Unit Tests for Automated Debugging. @ArchikiPrasad @EliasEskin @cyjustinchen @codezakh arxiv.org/abs/2502.01619 x.com/ArchikiPrasad/…

Archiki Prasad

@ArchikiPrasad

4 thg 2

🚨 Excited to share: "Learning to Generate Unit Tests for Automated Debugging" 🚨 which introduces ✨UTGen and UTDebug✨ for teaching LLMs to generate unit tests (UTs) and debugging code from generated tests. UTGen+UTDebug improve LLM-based code debugging by addressing 3 key…

ArchikiPrasad's tweet image. 🚨 Excited to share: "Learning to Generate Unit Tests for Automated Debugging" 🚨
which introduces ✨UTGen and UTDebug✨ for teaching LLMs to generate unit tests (UTs) and debugging code from generated tests.

UTGen+UTDebug improve LLM-based code debugging by addressing 3 key…

Zaid Khan đã đăng lại

Mohit Bansal

@mohitban47

5 thg 10

🚨 "Think the right amount" for improving both reasoning accuracy and efficiency! --> Large reasoning models under-adapt = underthink on hard problems and overthink on easy ones --> ✨TRAAC✨ is an online RL, difficulty-adaptive, attention-based compression method that prunes…

Joykirat

@joykiratsingh

3 thg 10

🚨 Excited to announce TRAAC, an online difficulty-adaptive, attention-based method that handles the tradeoff of under & overthinking in reasoning models to improve both accuracy and efficiency. Underthinking ❌: Models terminate reasoning too early on harder problems, leading…

joykiratsingh's tweet image. 🚨 Excited to announce TRAAC, an online difficulty-adaptive, attention-based method that handles the tradeoff of under &amp; overthinking in reasoning models to improve both accuracy and efficiency.

Underthinking ❌: Models terminate reasoning too early on harder problems, leading…

Zaid Khan đã đăng lại

Mohit Bansal

@mohitban47

6 thg 10

Zaid Khan đã đăng lại

Justin Chih-Yao Chen

@cyjustinchen

3 thg 10

Large reasoning models suffer from under-adaptiveness, which underthink on hard problems and overthink on easy ones. TRAAC addresses this by introducing ✨difficulty calibration and attention-based compression✨→ +8.4% accuracy & +36.8% efficiency! 1️⃣ TRAAC adaptively mitigates…

Joykirat

@joykiratsingh

3 thg 10

Zaid Khan đã đăng lại

Joykirat

@joykiratsingh

3 thg 10

Zaid Khan đã đăng lại

Hanqi Xiao

@hanqi_xiao

30 thg 9

🚨Excited to announce General Correctness Models (GCM): 🔎We find no special advantage using an LLM to predict its own correctness, instead finding that LLMs benefit from learning to predict the correctness of many other models – becoming a GCM. Huge thanks to @vaidehi_patil_,…

Elias Stengel-Eskin

@EliasEskin

30 thg 9

🚨 Announcing Generalized Correctness Models (GCMs) 🚨Finding that LLMs have little self knowledge about their own correctness, we train an 8B GCM to predict correctness of many models, which is more accurate than training model-specific CMs, and outperforms a larger…

EliasEskin's tweet image. 🚨 Announcing Generalized Correctness Models (GCMs) 🚨Finding that LLMs have little self knowledge about their own correctness, we train an 8B GCM to predict correctness of many models, which is more accurate than training model-specific CMs, and outperforms a larger…

Zaid Khan

@codezakh

2 thg 10

Can attest that this is true 🙂 and now there are RS emulators (github.com/LostCityRS/Ser…) which could be turned into a pretty cool environment to eval agents on — there are 100+ quests, lots of bosses to defeat, a legible "tech tree" with smithing, crafting etc

codezakh's tweet card. Setup scripts for Engine + Content. Contribute to LostCityRS/Server development by creating an account on GitHub.

GitHub - LostCityRS/Server: Setup scripts for Engine + Content

Nguồn: github.com

Alex Reibman 🖇️

@AlexReibman

10 thg 9

x.com/i/article/1965…

Zaid Khan đã đăng lại

Elias Stengel-Eskin

@EliasEskin

2 thg 10

🚨 Introducing DINCO, a zero-resource calibration method for verbalized LLM confidence. We normalize over self-generated distractors to enforce coherence ➡️ better-calibrated and less saturated (more usable) confidence! ⚠️ Problem: Standard verbalized confidence is overconfident…

EliasEskin's tweet image. 🚨 Introducing DINCO, a zero-resource calibration method for verbalized LLM confidence. We normalize over self-generated distractors to enforce coherence ➡️ better-calibrated and less saturated (more usable) confidence!

⚠️ Problem: Standard verbalized confidence is overconfident…

Zaid Khan đã đăng lại

Mohit Bansal

@mohitban47

2 thg 10

🚨 NuRL: Nudging the Boundaries of LLM Reasoning -- GRPO improves LLM reasoning, but stays within the model's "comfort zone" i.e., hard samples (0% pass rate) remain unsolvable and contribute no meaningful gradients. -- In NuRL, we show that "nudging" the LLM with…

Justin Chih-Yao Chen

@cyjustinchen

1 thg 10

🚨 NuRL: Nudging the Boundaries of LLM Reasoning GRPO improves LLM reasoning, but often within the model's "comfort zone": hard samples (w/ 0% pass rate) remain unsolvable and contribute zero learning signals. In NuRL, we show that "nudging" the LLM with self-generated hints…

cyjustinchen's tweet image. 🚨 NuRL: Nudging the Boundaries of LLM Reasoning

GRPO improves LLM reasoning, but often within the model's "comfort zone": hard samples (w/ 0% pass rate) remain unsolvable and contribute zero learning signals. In NuRL, we show that "nudging" the LLM with self-generated hints…

Zaid Khan đã đăng lại

Justin Chih-Yao Chen

@cyjustinchen

1 thg 10

Zaid Khan đã đăng lại

Elias Stengel-Eskin

@EliasEskin

30 thg 9

Zaid Khan đã đăng lại

Mohit Bansal

@mohitban47

26 thg 9

Shiny new building and good views from the new VirginiaTech campus in DC 😉 -- it was a pleasure to meet everyone and engage in exciting discussions about trustworthy agents, collaborative reasoning/privacy, and controllable multimodal generation -- thanks again @profnaren,…

mohitban47's tweet image. Shiny new building and good views from the new VirginiaTech campus in DC 😉 -- it was a pleasure to meet everyone and engage in exciting discussions about trustworthy agents, collaborative reasoning/privacy, and controllable multimodal generation -- thanks again @profnaren,…

Mohit Bansal

@mohitban47

23 thg 9

Looking forward to giving this Distinguished Lecture at Virginia Tech on Friday & meeting the awesome faculty+students there! 🙂 Will also discuss some of our new work in multi-agent compositional privacy risks+mitigation, ambiguity/loophole exploitation by LLMs, and bridging…