Zaid Khan

@codezakh

NDSEG Fellow / PhD @uncnlp with @mohitban47 working on automating env/data generation + program synthesis formerly @allenai @neclabsamerica

Boston, USA

zaidkhan.me

Joined June 2023

553Posts 606Followers 1KFollowing

You might like

@alexxthiery

@olivia__white1

@Molly_M_Miller

@NicholasRebold

@beth_miecz

@0xCrispy

@saibayadon

@LordAlirezaF

@danieljvdm

@Aishwarya_R_M

Pinned

Zaid Khan

@codezakh

Oct 15

How can an agent reverse engineer the underlying laws of an unknown, hostile & stochastic environment in “one life”, without millions of steps + human-provided goals / rewards? In our work, we: 1️⃣ infer an executable symbolic world model (a probabilistic program capturing…

Zaid Khan reposted

Han Lin

@hanlin_hl

Dec 16

Multimodal LLMs (MLLMs) excel at reasoning, layout understanding, and planning—yet in diffusion-based generation, they are often reduced to simple multimodal encoders. What if MLLMs could reason directly in latent space and guide diffusion generation with fine-grained,…

Zaid Khan

@codezakh

Dec 14

Big congrats to Mohit on becoming an ACL Fellow! 🥳 He's been a tireless researcher and mentor and seeing it recognized makes me happy 🥲👏

Mohit Bansal

@mohitban47

Dec 14

Deeply happy and honored to be elected as an ACL Fellow -- and to be a part of the respected cohort of this+past years' fellows (congrats everyone)! 🙏 All the credit (and sincere gratitude) to all my amazing students, postdocs, collaborators, mentors, and family! 🤗💙

mohitban47's tweet image. Deeply happy and honored to be elected as an ACL Fellow -- and to be a part of the respected cohort of this+past years' fellows (congrats everyone)! 🙏

All the credit (and sincere gratitude) to all my amazing students, postdocs, collaborators, mentors, and family! 🤗💙

Zaid Khan reposted

Mohit Bansal

@mohitban47

Dec 14

Zaid Khan reposted

Elias Stengel-Eskin

@EliasEskin

Dec 9

🚨 Excited to share DART, a multi-agent multimodal debate framework that uses disagreement between VLM agents to address visual uncertainty. VLM debate stagnates and VLMs can struggle with which tools to call – we use disagreement to recruit visual tools (e.g. OCR, spatial…

EliasEskin's tweet image. 🚨 Excited to share DART, a multi-agent multimodal debate framework that uses disagreement between VLM agents to address visual uncertainty. VLM debate stagnates and VLMs can struggle with which tools to call – we use disagreement to recruit visual tools (e.g. OCR, spatial…

Zaid Khan reposted

Ziyang Wang

@ZiyangW00

Dec 8

🚨 Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding 🚨 Introducing Active Video Perception: an evidence-seeking framework that treats the video as an interactive environment and acquires compact, query-relevant evidence. 🎬 Key…

Zaid Khan reposted

Negin Raoof

@NeginRaoof_

Dec 6

How can we make a better TerminalBench agent? Today, we are announcing the OpenThoughts-Agent project. OpenThoughts-Agent v1 is the first TerminalBench agent trained on fully open curated SFT and RL environments. OpenThinker-Agent-v1 is the strongest model of its size on…

NeginRaoof_'s tweet image. How can we make a better TerminalBench agent?
Today, we are announcing the OpenThoughts-Agent project.
OpenThoughts-Agent v1 is the first TerminalBench agent trained on fully open curated SFT and RL environments.
OpenThinker-Agent-v1 is the strongest model of its size on…

Zaid Khan reposted

Ryan Marten

@ryanmart3n

Dec 6

The OpenThoughts team is now tackling data for post-training agents! Our first RL environments and SFT trajectories datasets are just the start of our open research collaboration. I’m very excited for the path ahead. We have a great team assembled and have been working…

Negin Raoof

@NeginRaoof_

Dec 6

Zaid Khan reposted

Jaemin Cho

@jmin__cho

Dec 2

#NeurIPS2025 is live! I'll be in San Diego through Saturday (Dec 06) and would love to meet prospective graduate students interested in joining my lab at JHU. If you're excited about multimodal AI, robotics, unified models, learning action/motion from video, etc. let’s chat!…

Jaemin Cho

@jmin__cho

May 20

Sharing some personal updates 🥳: - I've completed my PhD at @unccs! 🎓 - Starting Fall 2026, I'll be joining the Computer Science dept. at Johns Hopkins University (@JHUCompSci) as an Assistant Professor 💙 - Currently exploring options + finalizing the plan for my gap year (Aug…

jmin__cho's tweet image. Sharing some personal updates 🥳:
- I've completed my PhD at @unccs! 🎓
- Starting Fall 2026, I'll be joining the Computer Science dept. at Johns Hopkins University (@JHUCompSci) as an Assistant Professor 💙
- Currently exploring options + finalizing the plan for my gap year (Aug…

Zaid Khan reposted

Daeun Lee

@danadaeun

Dec 2

🤔 We rely on gaze to guide our actions, but can current MLLMs truly understand it and infer our intentions? Introducing StreamGaze 👀, the first benchmark that evaluates gaze-guided temporal reasoning (past, present, and future) and proactive understanding in streaming video…

Zaid Khan reposted

Elias Stengel-Eskin

@EliasEskin

Dec 2

🚨 Excited to be (remotely) giving a talk tomorrow 12/2 at the "Exploring Trust and Reliability in LLM Evaluation" #NeurIPS expo workshop! I’ll be presenting our work on pragmatic training to improve calibration and persuasion, and skill-based granular evaluation for data…

Zaid Khan reposted

Han Lin

@hanlin_hl

Dec 2

🏖️ Heading to San Diego for #NeurIPS (Dec 2-7th)! I will be presenting: Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents 🗓️ Thu 4 Dec 4:30 p.m. PT — 7:30 p.m. PT | Exhibit Hall C,D,E #4412 Excited to chat about our follow-up work on…

Han Lin

@hanlin_hl

Sep 25

🎉 Excited to share that Bifrost-1 has been accepted to #NeurIPS2025! ☀️ Bridging MLLMs and diffusion into a unified multimodal understanding and generation model can be very costly to train. ✨ Bifrost-1 addresses this by leveraging patch-level CLIP latents that are natively…

Zaid Khan reposted

Duy Nguyen

@duynguyen772

Dec 2

I will be at #NeurIPS2025 to present our work: "LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits". Come visit our poster: 🗓️ Thu 4 Dec, 4:30 p.m. – 7:30 p.m. PST | Exhibit Hall C,D,E #4108 Let's connect and chat about LLM post-training, inference-time…

Archiki Prasad

@ArchikiPrasad

Oct 3, 2024

Reward Models (RMs) are crucial for RLHF training, but: Using single RM: 1⃣ poor generalization, 2⃣ ambiguous judgements & 3⃣ over-optimization Using multiple RMs simultaneously: 1⃣ resource-intensive & 2⃣ susceptible to noisy/conflicting rewards 🚨We introduce ✨LASeR✨,…

ArchikiPrasad's tweet image. Reward Models (RMs) are crucial for RLHF training, but:

Using single RM: 1⃣ poor generalization, 2⃣ ambiguous judgements &amp; 3⃣ over-optimization

Using multiple RMs simultaneously: 1⃣ resource-intensive &amp; 2⃣ susceptible to noisy/conflicting rewards

🚨We introduce ✨LASeR✨,…

Zaid Khan reposted

Archiki Prasad

@ArchikiPrasad

Dec 1

⛱️ Heading to San Diego for #NeurIPS (Dec 2-7th)! I am on the industry job market & will be presenting: LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits (🗓️Dec 4, 4:30PM) Excited to chat about research (reasoning, LLM agents, post-training) & job…

Archiki Prasad

@ArchikiPrasad

Sep 22

🎉Excited to share that LASeR has been accepted to #NeurIPS2025!☀️ RLHF with a single reward model can be prone to reward-hacking while ensembling multiple RMs is costly and prone to conflicting rewards. ✨LASeR addresses this by using multi-armed bandits to select the most…

Zaid Khan reposted

Yue Zhang

@zhan1624

Nov 26

🚨 Thrilled to share Prune-Then-Plan! - VLM-based EQA agents often move back-and-forth due to miscalibration. - Our Prune-Then-Plan method filters noisy frontier choices and delegates planning to coverage-based search. - This yields stable, calibrated exploration and…

Roni Sengupta

@SenguptRoni

Nov 26

🚨Introducing our new work, Prune-Then-Plan — a method that enables AI agents to better explore 3D scenes for embodied question answering (EQA). 🧵 1/2 🟥 Existing EQA systems leverage VLMs to drive exploration choice at each step by selecting the ‘best’ next frontier, but…

SenguptRoni's tweet image. 🚨Introducing our new work, Prune-Then-Plan — a method that enables AI agents to better explore 3D scenes for embodied question answering (EQA).

🧵 1/2
🟥 Existing EQA systems leverage VLMs to drive exploration choice at each step by selecting the ‘best’ next frontier, but…

Zaid Khan reposted

Mohit Bansal

@mohitban47

Nov 26

🚨 Check out our generative process reward model, PRInTS, that improves agents' complex, long-horizon information-seeking capabilities via: 1⃣ novel MCTS-based fine-grained information-gain scoring across multiple dimensions. 2⃣ accurate step-level guidance based on compression…

Archiki Prasad

@ArchikiPrasad

Nov 25

🚨 Excited to announce ✨PRInTS✨, a generative Process Reward Model (PRM) that improves agent’s long-horizon info-seeking via info-gain scoring + summarization. PRInTS guides open + specialized agents with major boosts 👉+9.3% avg. w/ Qwen3-32B across GAIA, FRAMES &…

ArchikiPrasad's tweet image. 🚨 Excited to announce ✨PRInTS✨, a generative Process Reward Model (PRM) that improves agent’s long-horizon info-seeking via info-gain scoring + summarization.

PRInTS guides open + specialized agents with major boosts 👉+9.3% avg. w/ Qwen3-32B across GAIA, FRAMES &amp;…

Zaid Khan

@codezakh

Nov 26

Thanks @_akhaliq for posting about our work on guiding agents for long-horizon information-seeking tasks using a generative process reward model! For more details, see the original thread: x.com/ArchikiPrasad/…

AK

@_akhaliq

Nov 25

PRInTS Reward Modeling for Long-Horizon Information Seeking

Zaid Khan reposted

AK

@_akhaliq

Nov 25

PRInTS Reward Modeling for Long-Horizon Information Seeking

Zaid Khan

@codezakh

Nov 25

We want agents to solve problems that require searching and exploring multiple paths over long horizons, such as complex information seeking tasks which require the agent to answer questions by exploring the internet. Process Reward Models (PRMs) are a promising approach which…

Archiki Prasad

@ArchikiPrasad

Nov 25

Zaid Khan reposted

Justin Chih-Yao Chen

@cyjustinchen

Nov 25

Long-horizon information-seeking tasks remain challenging for LLM agents, and existing PRMs (step-wise process reward models) fall short because: 1⃣ the reasoning process involves interleaved tool calls and responses 2⃣ the context grows rapidly due to the extended task horizon…