codezakh's profile picture. NDSEG Fellow / PhD @uncnlp with @mohitban47 working on automating env/data generation + program synthesis
formerly @allenai @neclabsamerica

Zaid Khan

@codezakh

NDSEG Fellow / PhD @uncnlp with @mohitban47 working on automating env/data generation + program synthesis formerly @allenai @neclabsamerica

Pinned

How can an agent reverse engineer the underlying laws of an unknown, hostile & stochastic environment in “one life”, without millions of steps + human-provided goals / rewards? In our work, we: 1️⃣ infer an executable symbolic world model (a probabilistic program capturing…


Zaid Khan reposted

Multimodal LLMs (MLLMs) excel at reasoning, layout understanding, and planning—yet in diffusion-based generation, they are often reduced to simple multimodal encoders. What if MLLMs could reason directly in latent space and guide diffusion generation with fine-grained,…


Big congrats to Mohit on becoming an ACL Fellow! 🥳 He's been a tireless researcher and mentor and seeing it recognized makes me happy 🥲👏

Deeply happy and honored to be elected as an ACL Fellow -- and to be a part of the respected cohort of this+past years' fellows (congrats everyone)! 🙏 All the credit (and sincere gratitude) to all my amazing students, postdocs, collaborators, mentors, and family! 🤗💙

mohitban47's tweet image. Deeply happy and honored to be elected as an ACL Fellow --  and to be a part of the respected cohort of this+past years' fellows (congrats everyone)! 🙏 

All the credit (and sincere gratitude) to all my amazing students, postdocs, collaborators, mentors, and family! 🤗💙


Zaid Khan reposted

Deeply happy and honored to be elected as an ACL Fellow -- and to be a part of the respected cohort of this+past years' fellows (congrats everyone)! 🙏 All the credit (and sincere gratitude) to all my amazing students, postdocs, collaborators, mentors, and family! 🤗💙

mohitban47's tweet image. Deeply happy and honored to be elected as an ACL Fellow --  and to be a part of the respected cohort of this+past years' fellows (congrats everyone)! 🙏 

All the credit (and sincere gratitude) to all my amazing students, postdocs, collaborators, mentors, and family! 🤗💙

Zaid Khan reposted

🚨 Excited to share DART, a multi-agent multimodal debate framework that uses disagreement between VLM agents to address visual uncertainty. VLM debate stagnates and VLMs can struggle with which tools to call – we use disagreement to recruit visual tools (e.g. OCR, spatial…

EliasEskin's tweet image. 🚨 Excited to share DART, a multi-agent multimodal debate framework that uses disagreement between VLM agents to address visual uncertainty. VLM debate stagnates and VLMs can struggle with which tools to call – we use disagreement to recruit visual tools (e.g. OCR, spatial…

Zaid Khan reposted

🚨 Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding 🚨 Introducing Active Video Perception: an evidence-seeking framework that treats the video as an interactive environment and acquires compact, query-relevant evidence. 🎬 Key…


Zaid Khan reposted

How can we make a better TerminalBench agent? Today, we are announcing the OpenThoughts-Agent project. OpenThoughts-Agent v1 is the first TerminalBench agent trained on fully open curated SFT and RL environments. OpenThinker-Agent-v1 is the strongest model of its size on…

NeginRaoof_'s tweet image. How can we make a better TerminalBench agent?
Today, we are announcing the OpenThoughts-Agent project. 
OpenThoughts-Agent v1 is the first TerminalBench agent trained on fully open curated SFT and RL environments.
OpenThinker-Agent-v1 is the strongest model of its size on…

Zaid Khan reposted

The OpenThoughts team is now tackling data for post-training agents! Our first RL environments and SFT trajectories datasets are just the start of our open research collaboration. I’m very excited for the path ahead. We have a great team assembled and have been working…

How can we make a better TerminalBench agent? Today, we are announcing the OpenThoughts-Agent project. OpenThoughts-Agent v1 is the first TerminalBench agent trained on fully open curated SFT and RL environments. OpenThinker-Agent-v1 is the strongest model of its size on…

NeginRaoof_'s tweet image. How can we make a better TerminalBench agent?
Today, we are announcing the OpenThoughts-Agent project. 
OpenThoughts-Agent v1 is the first TerminalBench agent trained on fully open curated SFT and RL environments.
OpenThinker-Agent-v1 is the strongest model of its size on…


Zaid Khan reposted

#NeurIPS2025 is live! I'll be in San Diego through Saturday (Dec 06) and would love to meet prospective graduate students interested in joining my lab at JHU. If you're excited about multimodal AI, robotics, unified models, learning action/motion from video, etc. let’s chat!…

Sharing some personal updates 🥳: - I've completed my PhD at @unccs! 🎓 - Starting Fall 2026, I'll be joining the Computer Science dept. at Johns Hopkins University (@JHUCompSci) as an Assistant Professor 💙 - Currently exploring options + finalizing the plan for my gap year (Aug…

jmin__cho's tweet image. Sharing some personal updates 🥳:
- I've completed my PhD at @unccs! 🎓
- Starting Fall 2026, I'll be joining the Computer Science dept. at Johns Hopkins University (@JHUCompSci) as an Assistant Professor 💙
- Currently exploring options + finalizing the plan for my gap year (Aug…
jmin__cho's tweet image. Sharing some personal updates 🥳:
- I've completed my PhD at @unccs! 🎓
- Starting Fall 2026, I'll be joining the Computer Science dept. at Johns Hopkins University (@JHUCompSci) as an Assistant Professor 💙
- Currently exploring options + finalizing the plan for my gap year (Aug…


Zaid Khan reposted

🤔 We rely on gaze to guide our actions, but can current MLLMs truly understand it and infer our intentions? Introducing StreamGaze 👀, the first benchmark that evaluates gaze-guided temporal reasoning (past, present, and future) and proactive understanding in streaming video…


Zaid Khan reposted

🚨 Excited to be (remotely) giving a talk tomorrow 12/2 at the "Exploring Trust and Reliability in LLM Evaluation" #NeurIPS expo workshop! I’ll be presenting our work on pragmatic training to improve calibration and persuasion, and skill-based granular evaluation for data…


Zaid Khan reposted

🏖️ Heading to San Diego for #NeurIPS (Dec 2-7th)! I will be presenting: Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents 🗓️ Thu 4 Dec 4:30 p.m. PT — 7:30 p.m. PT | Exhibit Hall C,D,E #4412 Excited to chat about our follow-up work on…

🎉 Excited to share that Bifrost-1 has been accepted to #NeurIPS2025! ☀️ Bridging MLLMs and diffusion into a unified multimodal understanding and generation model can be very costly to train. ✨ Bifrost-1 addresses this by leveraging patch-level CLIP latents that are natively…



Zaid Khan reposted

I will be at #NeurIPS2025 to present our work: "LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits". Come visit our poster: 🗓️ Thu 4 Dec, 4:30 p.m. – 7:30 p.m. PST | Exhibit Hall C,D,E #4108 Let's connect and chat about LLM post-training, inference-time…

Reward Models (RMs) are crucial for RLHF training, but: Using single RM: 1⃣ poor generalization, 2⃣ ambiguous judgements & 3⃣ over-optimization Using multiple RMs simultaneously: 1⃣ resource-intensive & 2⃣ susceptible to noisy/conflicting rewards 🚨We introduce ✨LASeR✨,…

ArchikiPrasad's tweet image. Reward Models (RMs) are crucial for RLHF training, but:

Using single RM: 1⃣ poor generalization, 2⃣ ambiguous judgements & 3⃣ over-optimization

Using multiple RMs simultaneously: 1⃣ resource-intensive & 2⃣ susceptible to noisy/conflicting rewards

🚨We introduce ✨LASeR✨,…


Zaid Khan reposted

⛱️ Heading to San Diego for #NeurIPS (Dec 2-7th)! I am on the industry job market & will be presenting: LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits (🗓️Dec 4, 4:30PM) Excited to chat about research (reasoning, LLM agents, post-training) & job…

🎉Excited to share that LASeR has been accepted to #NeurIPS2025!☀️ RLHF with a single reward model can be prone to reward-hacking while ensembling multiple RMs is costly and prone to conflicting rewards. ✨LASeR addresses this by using multi-armed bandits to select the most…



Zaid Khan reposted

🚨 Thrilled to share Prune-Then-Plan! - VLM-based EQA agents often move back-and-forth due to miscalibration. - Our Prune-Then-Plan method filters noisy frontier choices and delegates planning to coverage-based search. - This yields stable, calibrated exploration and…

🚨Introducing our new work, Prune-Then-Plan — a method that enables AI agents to better explore 3D scenes for embodied question answering (EQA). 🧵 1/2 🟥 Existing EQA systems leverage VLMs to drive exploration choice at each step by selecting the ‘best’ next frontier, but…

SenguptRoni's tweet image. 🚨Introducing our new work, Prune-Then-Plan — a method that enables AI agents to better explore 3D scenes for embodied question answering (EQA).

🧵 1/2
🟥 Existing EQA systems leverage VLMs to drive exploration choice at each step by selecting the ‘best’ next frontier, but…


Zaid Khan reposted

🚨 Check out our generative process reward model, PRInTS, that improves agents' complex, long-horizon information-seeking capabilities via: 1⃣ novel MCTS-based fine-grained information-gain scoring across multiple dimensions. 2⃣ accurate step-level guidance based on compression…

🚨 Excited to announce ✨PRInTS✨, a generative Process Reward Model (PRM) that improves agent’s long-horizon info-seeking via info-gain scoring + summarization. PRInTS guides open + specialized agents with major boosts 👉+9.3% avg. w/ Qwen3-32B across GAIA, FRAMES &…

ArchikiPrasad's tweet image. 🚨 Excited to announce ✨PRInTS✨, a generative Process Reward Model (PRM) that improves agent’s long-horizon info-seeking via info-gain scoring + summarization.

PRInTS guides open + specialized agents with major boosts 👉+9.3% avg. w/ Qwen3-32B across GAIA, FRAMES &…


Thanks @_akhaliq for posting about our work on guiding agents for long-horizon information-seeking tasks using a generative process reward model! For more details, see the original thread: x.com/ArchikiPrasad/…

PRInTS Reward Modeling for Long-Horizon Information Seeking

_akhaliq's tweet image. PRInTS

Reward Modeling for Long-Horizon Information Seeking


Zaid Khan reposted

PRInTS Reward Modeling for Long-Horizon Information Seeking

_akhaliq's tweet image. PRInTS

Reward Modeling for Long-Horizon Information Seeking

We want agents to solve problems that require searching and exploring multiple paths over long horizons, such as complex information seeking tasks which require the agent to answer questions by exploring the internet. Process Reward Models (PRMs) are a promising approach which…

🚨 Excited to announce ✨PRInTS✨, a generative Process Reward Model (PRM) that improves agent’s long-horizon info-seeking via info-gain scoring + summarization. PRInTS guides open + specialized agents with major boosts 👉+9.3% avg. w/ Qwen3-32B across GAIA, FRAMES &…

ArchikiPrasad's tweet image. 🚨 Excited to announce ✨PRInTS✨, a generative Process Reward Model (PRM) that improves agent’s long-horizon info-seeking via info-gain scoring + summarization.

PRInTS guides open + specialized agents with major boosts 👉+9.3% avg. w/ Qwen3-32B across GAIA, FRAMES &…


Zaid Khan reposted

Long-horizon information-seeking tasks remain challenging for LLM agents, and existing PRMs (step-wise process reward models) fall short because: 1⃣ the reasoning process involves interleaved tool calls and responses 2⃣ the context grows rapidly due to the extended task horizon…

🚨 Excited to announce ✨PRInTS✨, a generative Process Reward Model (PRM) that improves agent’s long-horizon info-seeking via info-gain scoring + summarization. PRInTS guides open + specialized agents with major boosts 👉+9.3% avg. w/ Qwen3-32B across GAIA, FRAMES &…

ArchikiPrasad's tweet image. 🚨 Excited to announce ✨PRInTS✨, a generative Process Reward Model (PRM) that improves agent’s long-horizon info-seeking via info-gain scoring + summarization.

PRInTS guides open + specialized agents with major boosts 👉+9.3% avg. w/ Qwen3-32B across GAIA, FRAMES &…


Loading...

Something went wrong.


Something went wrong.