codezakh's profile picture. NDSEG Fellow / PhD @uncnlp with @mohitban47 working on automating env/data generation + program synthesis
formerly @allenai @neclabsamerica

Zaid Khan

@codezakh

NDSEG Fellow / PhD @uncnlp with @mohitban47 working on automating env/data generation + program synthesis formerly @allenai @neclabsamerica

ปักหมุด

How can an agent reverse engineer the underlying laws of an unknown, hostile & stochastic environment in “one life”, without millions of steps + human-provided goals / rewards? In our work, we: 1️⃣ infer an executable symbolic world model (a probabilistic program capturing…


Zaid Khan รีโพสต์แล้ว

#NeurIPS2025 is live! I'll be in San Diego through Saturday (Dec 06) and would love to meet prospective graduate students interested in joining my lab at JHU. If you're excited about multimodal AI, robotics, unified models, learning action/motion from video, etc. let’s chat!…

Sharing some personal updates 🥳: - I've completed my PhD at @unccs! 🎓 - Starting Fall 2026, I'll be joining the Computer Science dept. at Johns Hopkins University (@JHUCompSci) as an Assistant Professor 💙 - Currently exploring options + finalizing the plan for my gap year (Aug…

jmin__cho's tweet image. Sharing some personal updates 🥳:
- I've completed my PhD at @unccs! 🎓
- Starting Fall 2026, I'll be joining the Computer Science dept. at Johns Hopkins University (@JHUCompSci) as an Assistant Professor 💙
- Currently exploring options + finalizing the plan for my gap year (Aug…
jmin__cho's tweet image. Sharing some personal updates 🥳:
- I've completed my PhD at @unccs! 🎓
- Starting Fall 2026, I'll be joining the Computer Science dept. at Johns Hopkins University (@JHUCompSci) as an Assistant Professor 💙
- Currently exploring options + finalizing the plan for my gap year (Aug…


Zaid Khan รีโพสต์แล้ว

🤔 We rely on gaze to guide our actions, but can current MLLMs truly understand it and infer our intentions? Introducing StreamGaze 👀, the first benchmark that evaluates gaze-guided temporal reasoning (past, present, and future) and proactive understanding in streaming video…


Zaid Khan รีโพสต์แล้ว

🚨 Excited to be (remotely) giving a talk tomorrow 12/2 at the "Exploring Trust and Reliability in LLM Evaluation" #NeurIPS expo workshop! I’ll be presenting our work on pragmatic training to improve calibration and persuasion, and skill-based granular evaluation for data…


Zaid Khan รีโพสต์แล้ว

🏖️ Heading to San Diego for #NeurIPS (Dec 2-7th)! I will be presenting: Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents 🗓️ Thu 4 Dec 4:30 p.m. PT — 7:30 p.m. PT | Exhibit Hall C,D,E #4412 Excited to chat about our follow-up work on…

🎉 Excited to share that Bifrost-1 has been accepted to #NeurIPS2025! ☀️ Bridging MLLMs and diffusion into a unified multimodal understanding and generation model can be very costly to train. ✨ Bifrost-1 addresses this by leveraging patch-level CLIP latents that are natively…



Zaid Khan รีโพสต์แล้ว

I will be at #NeurIPS2025 to present our work: "LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits". Come visit our poster: 🗓️ Thu 4 Dec, 4:30 p.m. – 7:30 p.m. PST | Exhibit Hall C,D,E #4108 Let's connect and chat about LLM post-training, inference-time…

Reward Models (RMs) are crucial for RLHF training, but: Using single RM: 1⃣ poor generalization, 2⃣ ambiguous judgements & 3⃣ over-optimization Using multiple RMs simultaneously: 1⃣ resource-intensive & 2⃣ susceptible to noisy/conflicting rewards 🚨We introduce ✨LASeR✨,…

ArchikiPrasad's tweet image. Reward Models (RMs) are crucial for RLHF training, but:

Using single RM: 1⃣ poor generalization, 2⃣ ambiguous judgements & 3⃣ over-optimization

Using multiple RMs simultaneously: 1⃣ resource-intensive & 2⃣ susceptible to noisy/conflicting rewards

🚨We introduce ✨LASeR✨,…


Zaid Khan รีโพสต์แล้ว

⛱️ Heading to San Diego for #NeurIPS (Dec 2-7th)! I am on the industry job market & will be presenting: LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits (🗓️Dec 4, 4:30PM) Excited to chat about research (reasoning, LLM agents, post-training) & job…

🎉Excited to share that LASeR has been accepted to #NeurIPS2025!☀️ RLHF with a single reward model can be prone to reward-hacking while ensembling multiple RMs is costly and prone to conflicting rewards. ✨LASeR addresses this by using multi-armed bandits to select the most…



Zaid Khan รีโพสต์แล้ว

🚨 Thrilled to share Prune-Then-Plan! - VLM-based EQA agents often move back-and-forth due to miscalibration. - Our Prune-Then-Plan method filters noisy frontier choices and delegates planning to coverage-based search. - This yields stable, calibrated exploration and…

🚨Introducing our new work, Prune-Then-Plan — a method that enables AI agents to better explore 3D scenes for embodied question answering (EQA). 🧵 1/2 🟥 Existing EQA systems leverage VLMs to drive exploration choice at each step by selecting the ‘best’ next frontier, but…

SenguptRoni's tweet image. 🚨Introducing our new work, Prune-Then-Plan — a method that enables AI agents to better explore 3D scenes for embodied question answering (EQA).

🧵 1/2
🟥 Existing EQA systems leverage VLMs to drive exploration choice at each step by selecting the ‘best’ next frontier, but…


Zaid Khan รีโพสต์แล้ว

🚨 Check out our generative process reward model, PRInTS, that improves agents' complex, long-horizon information-seeking capabilities via: 1⃣ novel MCTS-based fine-grained information-gain scoring across multiple dimensions. 2⃣ accurate step-level guidance based on compression…

🚨 Excited to announce ✨PRInTS✨, a generative Process Reward Model (PRM) that improves agent’s long-horizon info-seeking via info-gain scoring + summarization. PRInTS guides open + specialized agents with major boosts 👉+9.3% avg. w/ Qwen3-32B across GAIA, FRAMES &…

ArchikiPrasad's tweet image. 🚨 Excited to announce ✨PRInTS✨, a generative Process Reward Model (PRM) that improves agent’s long-horizon info-seeking via info-gain scoring + summarization.

PRInTS guides open + specialized agents with major boosts 👉+9.3% avg. w/ Qwen3-32B across GAIA, FRAMES &…


Thanks @_akhaliq for posting about our work on guiding agents for long-horizon information-seeking tasks using a generative process reward model! For more details, see the original thread: x.com/ArchikiPrasad/…

PRInTS Reward Modeling for Long-Horizon Information Seeking

_akhaliq's tweet image. PRInTS

Reward Modeling for Long-Horizon Information Seeking


Zaid Khan รีโพสต์แล้ว

PRInTS Reward Modeling for Long-Horizon Information Seeking

_akhaliq's tweet image. PRInTS

Reward Modeling for Long-Horizon Information Seeking

We want agents to solve problems that require searching and exploring multiple paths over long horizons, such as complex information seeking tasks which require the agent to answer questions by exploring the internet. Process Reward Models (PRMs) are a promising approach which…

🚨 Excited to announce ✨PRInTS✨, a generative Process Reward Model (PRM) that improves agent’s long-horizon info-seeking via info-gain scoring + summarization. PRInTS guides open + specialized agents with major boosts 👉+9.3% avg. w/ Qwen3-32B across GAIA, FRAMES &…

ArchikiPrasad's tweet image. 🚨 Excited to announce ✨PRInTS✨, a generative Process Reward Model (PRM) that improves agent’s long-horizon info-seeking via info-gain scoring + summarization.

PRInTS guides open + specialized agents with major boosts 👉+9.3% avg. w/ Qwen3-32B across GAIA, FRAMES &…


Zaid Khan รีโพสต์แล้ว

Long-horizon information-seeking tasks remain challenging for LLM agents, and existing PRMs (step-wise process reward models) fall short because: 1⃣ the reasoning process involves interleaved tool calls and responses 2⃣ the context grows rapidly due to the extended task horizon…

🚨 Excited to announce ✨PRInTS✨, a generative Process Reward Model (PRM) that improves agent’s long-horizon info-seeking via info-gain scoring + summarization. PRInTS guides open + specialized agents with major boosts 👉+9.3% avg. w/ Qwen3-32B across GAIA, FRAMES &…

ArchikiPrasad's tweet image. 🚨 Excited to announce ✨PRInTS✨, a generative Process Reward Model (PRM) that improves agent’s long-horizon info-seeking via info-gain scoring + summarization.

PRInTS guides open + specialized agents with major boosts 👉+9.3% avg. w/ Qwen3-32B across GAIA, FRAMES &…


Zaid Khan รีโพสต์แล้ว

🚨 PRInTS addresses key challenges for info-seeking agents + is compatible with trained/generalist LLMs (both open- + closed-source) by guiding agents towards better queries/actions in long-horizon tasks (GAIA, FRAMES, WebWalkerQA), w/ strong gains (e.g. +9.3% for Qwen 32B, +~4%…

🚨 Excited to announce ✨PRInTS✨, a generative Process Reward Model (PRM) that improves agent’s long-horizon info-seeking via info-gain scoring + summarization. PRInTS guides open + specialized agents with major boosts 👉+9.3% avg. w/ Qwen3-32B across GAIA, FRAMES &…

ArchikiPrasad's tweet image. 🚨 Excited to announce ✨PRInTS✨, a generative Process Reward Model (PRM) that improves agent’s long-horizon info-seeking via info-gain scoring + summarization.

PRInTS guides open + specialized agents with major boosts 👉+9.3% avg. w/ Qwen3-32B across GAIA, FRAMES &…


Zaid Khan รีโพสต์แล้ว

🚨 Excited to announce ✨PRInTS✨, a generative Process Reward Model (PRM) that improves agent’s long-horizon info-seeking via info-gain scoring + summarization. PRInTS guides open + specialized agents with major boosts 👉+9.3% avg. w/ Qwen3-32B across GAIA, FRAMES &…

ArchikiPrasad's tweet image. 🚨 Excited to announce ✨PRInTS✨, a generative Process Reward Model (PRM) that improves agent’s long-horizon info-seeking via info-gain scoring + summarization.

PRInTS guides open + specialized agents with major boosts 👉+9.3% avg. w/ Qwen3-32B across GAIA, FRAMES &…

Zaid Khan รีโพสต์แล้ว

🚨 Excited to share SketchVerify — a framework that scales trajectory planning for video generation. ➡️ Sketch-level motion previews let us search dozens of trajectory candidates instantly — without paying the cost of the time-consuming diffusion process. ➡️ A multimodal…


Zaid Khan รีโพสต์แล้ว

🚨 Thrilled to introduce DEER-3D: Error-Driven Scene Editing for 3D Grounding in Large Language Models - Introduces an error-driven scene editing framework to improve 3D visual grounding in 3D-LLMs. - Generates targeted 3D counterfactual edits that directly challenge the…

zhan1624's tweet image. 🚨 Thrilled to introduce DEER-3D: Error-Driven Scene Editing for 3D Grounding in Large Language Models

- Introduces an error-driven scene editing framework to improve 3D visual grounding in 3D-LLMs.

- Generates targeted 3D counterfactual edits that directly challenge the…

Zaid Khan รีโพสต์แล้ว

Today, we’re announcing the next chapter of Terminal-Bench with two releases: 1. Harbor, a new package for running sandboxed agent rollouts at scale 2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification

alexgshaw's tweet image. Today, we’re announcing the next chapter of Terminal-Bench with two releases:

1. Harbor, a new package for running sandboxed agent rollouts at scale
2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification

Zaid Khan รีโพสต์แล้ว

We live, feel, and create by perceiving the world as visual spaces unfolding through time — videos. Our memories and even our language are spatial: mind-palaces, mind-maps, "taking steps in the right direction..." Super excited to see Cambrian-S pushing this frontier! And,…

Introducing Cambrian-S it’s a position, a dataset, a benchmark, and a model but above all, it represents our first steps toward exploring spatial supersensing in video. 🧶



Zaid Khan รีโพสต์แล้ว

MLLMs are great at understanding videos, but struggle with spatial reasoning—like estimating distances or tracking objects across time. the bottleneck? getting precise 3D spatial annotations on real videos is expensive and error-prone. introducing SIMS-V 🤖 [1/n]


Zaid Khan รีโพสต์แล้ว

I'll be presenting ✨MAgICoRe✨ virtually tonight at 7 PM ET / 8 AM CST (Gather Session 3)! I'll discuss 3 key challenges in LLM refinement for reasoning, and how MAgICoRe tackles them jointly: 1⃣ Over-correction on easy problems 2⃣ Failure to localize & fix its own errors 3⃣…

🚨 Check out our awesome students/postdocs' papers at #EMNLP2025 and say hi to them 👋! Also, I will give a keynote (virtually) on "Attributable, Conflict-Robust, and Multimodal Summarization with Multi-Source Retrieval" at the NewSumm workshop. -- Jaehong (in-person) finished…

mohitban47's tweet image. 🚨 Check out our awesome students/postdocs' papers at #EMNLP2025 and say hi to them 👋! 

Also, I will give a keynote (virtually) on "Attributable, Conflict-Robust, and Multimodal Summarization with Multi-Source Retrieval" at the NewSumm workshop.

-- Jaehong (in-person) finished…


Loading...

Something went wrong.


Something went wrong.