Zaid Khan

@codezakh

NDSEG Fellow / PhD @uncnlp with @mohitban47 working on automating env/data generation + program synthesis formerly @allenai @neclabsamerica

Boston, USA

zaidkhan.me

เข้าร่วมเมื่อ มิถุนายน 2023

545โพสต์ 592ผู้ติดตาม 998กําลังติดตาม

คุณอาจชื่นชอบ

@alexxthiery

@olivia__white1

@Molly_M_Miller

@NicholasRebold

@beth_miecz

@0xCrispy

@saibayadon

@LordAlirezaF

@danieljvdm

@Aishwarya_R_M

ปักหมุด

Zaid Khan

@codezakh

15 ต.ค.

How can an agent reverse engineer the underlying laws of an unknown, hostile & stochastic environment in “one life”, without millions of steps + human-provided goals / rewards? In our work, we: 1️⃣ infer an executable symbolic world model (a probabilistic program capturing…

Zaid Khan รีโพสต์แล้ว

Jaemin Cho @NeurIPS2025

@jmin__cho

2 ธ.ค.

#NeurIPS2025 is live! I'll be in San Diego through Saturday (Dec 06) and would love to meet prospective graduate students interested in joining my lab at JHU. If you're excited about multimodal AI, robotics, unified models, learning action/motion from video, etc. let’s chat!…

Jaemin Cho @NeurIPS2025

@jmin__cho

20 พ.ค.

Sharing some personal updates 🥳: - I've completed my PhD at @unccs! 🎓 - Starting Fall 2026, I'll be joining the Computer Science dept. at Johns Hopkins University (@JHUCompSci) as an Assistant Professor 💙 - Currently exploring options + finalizing the plan for my gap year (Aug…

jmin__cho's tweet image. Sharing some personal updates 🥳:
- I've completed my PhD at @unccs! 🎓
- Starting Fall 2026, I'll be joining the Computer Science dept. at Johns Hopkins University (@JHUCompSci) as an Assistant Professor 💙
- Currently exploring options + finalizing the plan for my gap year (Aug…

Zaid Khan รีโพสต์แล้ว

Daeun Lee

@danadaeun

2 ธ.ค.

🤔 We rely on gaze to guide our actions, but can current MLLMs truly understand it and infer our intentions? Introducing StreamGaze 👀, the first benchmark that evaluates gaze-guided temporal reasoning (past, present, and future) and proactive understanding in streaming video…

Zaid Khan รีโพสต์แล้ว

Elias Stengel-Eskin

@EliasEskin

2 ธ.ค.

🚨 Excited to be (remotely) giving a talk tomorrow 12/2 at the "Exploring Trust and Reliability in LLM Evaluation" #NeurIPS expo workshop! I’ll be presenting our work on pragmatic training to improve calibration and persuasion, and skill-based granular evaluation for data…

Zaid Khan รีโพสต์แล้ว

Han Lin @ NeurIPS2025

@hanlin_hl

2 ธ.ค.

🏖️ Heading to San Diego for #NeurIPS (Dec 2-7th)! I will be presenting: Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents 🗓️ Thu 4 Dec 4:30 p.m. PT — 7:30 p.m. PT | Exhibit Hall C,D,E #4412 Excited to chat about our follow-up work on…

Han Lin @ NeurIPS2025

@hanlin_hl

25 ก.ย.

🎉 Excited to share that Bifrost-1 has been accepted to #NeurIPS2025! ☀️ Bridging MLLMs and diffusion into a unified multimodal understanding and generation model can be very costly to train. ✨ Bifrost-1 addresses this by leveraging patch-level CLIP latents that are natively…

Zaid Khan รีโพสต์แล้ว

Duy Nguyen

@duynguyen772

2 ธ.ค.

I will be at #NeurIPS2025 to present our work: "LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits". Come visit our poster: 🗓️ Thu 4 Dec, 4:30 p.m. – 7:30 p.m. PST | Exhibit Hall C,D,E #4108 Let's connect and chat about LLM post-training, inference-time…

Archiki Prasad ✈️ NeurIPS 2025

@ArchikiPrasad

3 ต.ค. 2024

Reward Models (RMs) are crucial for RLHF training, but: Using single RM: 1⃣ poor generalization, 2⃣ ambiguous judgements & 3⃣ over-optimization Using multiple RMs simultaneously: 1⃣ resource-intensive & 2⃣ susceptible to noisy/conflicting rewards 🚨We introduce ✨LASeR✨,…

ArchikiPrasad's tweet image. Reward Models (RMs) are crucial for RLHF training, but:

Using single RM: 1⃣ poor generalization, 2⃣ ambiguous judgements &amp; 3⃣ over-optimization

Using multiple RMs simultaneously: 1⃣ resource-intensive &amp; 2⃣ susceptible to noisy/conflicting rewards

🚨We introduce ✨LASeR✨,…

Zaid Khan รีโพสต์แล้ว

Archiki Prasad ✈️ NeurIPS 2025

@ArchikiPrasad

1 ธ.ค.

⛱️ Heading to San Diego for #NeurIPS (Dec 2-7th)! I am on the industry job market & will be presenting: LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits (🗓️Dec 4, 4:30PM) Excited to chat about research (reasoning, LLM agents, post-training) & job…

Archiki Prasad ✈️ NeurIPS 2025

@ArchikiPrasad

22 ก.ย.

🎉Excited to share that LASeR has been accepted to #NeurIPS2025!☀️ RLHF with a single reward model can be prone to reward-hacking while ensembling multiple RMs is costly and prone to conflicting rewards. ✨LASeR addresses this by using multi-armed bandits to select the most…

Zaid Khan รีโพสต์แล้ว

Yue Zhang

@zhan1624

26 พ.ย.

🚨 Thrilled to share Prune-Then-Plan! - VLM-based EQA agents often move back-and-forth due to miscalibration. - Our Prune-Then-Plan method filters noisy frontier choices and delegates planning to coverage-based search. - This yields stable, calibrated exploration and…

Roni Sengupta

@SenguptRoni

26 พ.ย.

🚨Introducing our new work, Prune-Then-Plan — a method that enables AI agents to better explore 3D scenes for embodied question answering (EQA). 🧵 1/2 🟥 Existing EQA systems leverage VLMs to drive exploration choice at each step by selecting the ‘best’ next frontier, but…

SenguptRoni's tweet image. 🚨Introducing our new work, Prune-Then-Plan — a method that enables AI agents to better explore 3D scenes for embodied question answering (EQA).

🧵 1/2
🟥 Existing EQA systems leverage VLMs to drive exploration choice at each step by selecting the ‘best’ next frontier, but…

Zaid Khan รีโพสต์แล้ว

Mohit Bansal

@mohitban47

26 พ.ย.

🚨 Check out our generative process reward model, PRInTS, that improves agents' complex, long-horizon information-seeking capabilities via: 1⃣ novel MCTS-based fine-grained information-gain scoring across multiple dimensions. 2⃣ accurate step-level guidance based on compression…

Archiki Prasad ✈️ NeurIPS 2025

@ArchikiPrasad

25 พ.ย.

🚨 Excited to announce ✨PRInTS✨, a generative Process Reward Model (PRM) that improves agent’s long-horizon info-seeking via info-gain scoring + summarization. PRInTS guides open + specialized agents with major boosts 👉+9.3% avg. w/ Qwen3-32B across GAIA, FRAMES &…

ArchikiPrasad's tweet image. 🚨 Excited to announce ✨PRInTS✨, a generative Process Reward Model (PRM) that improves agent’s long-horizon info-seeking via info-gain scoring + summarization.

PRInTS guides open + specialized agents with major boosts 👉+9.3% avg. w/ Qwen3-32B across GAIA, FRAMES &amp;…

Zaid Khan

@codezakh

26 พ.ย.

Thanks @_akhaliq for posting about our work on guiding agents for long-horizon information-seeking tasks using a generative process reward model! For more details, see the original thread: x.com/ArchikiPrasad/…

AK

@_akhaliq

25 พ.ย.

PRInTS Reward Modeling for Long-Horizon Information Seeking

Zaid Khan รีโพสต์แล้ว

AK

@_akhaliq

25 พ.ย.

PRInTS Reward Modeling for Long-Horizon Information Seeking

Zaid Khan

@codezakh

25 พ.ย.

We want agents to solve problems that require searching and exploring multiple paths over long horizons, such as complex information seeking tasks which require the agent to answer questions by exploring the internet. Process Reward Models (PRMs) are a promising approach which…

Archiki Prasad ✈️ NeurIPS 2025

@ArchikiPrasad

25 พ.ย.

Zaid Khan รีโพสต์แล้ว

Justin Chih-Yao Chen

@cyjustinchen

25 พ.ย.

Long-horizon information-seeking tasks remain challenging for LLM agents, and existing PRMs (step-wise process reward models) fall short because: 1⃣ the reasoning process involves interleaved tool calls and responses 2⃣ the context grows rapidly due to the extended task horizon…

Archiki Prasad ✈️ NeurIPS 2025

@ArchikiPrasad

25 พ.ย.

Zaid Khan รีโพสต์แล้ว

Elias Stengel-Eskin

@EliasEskin

25 พ.ย.

🚨 PRInTS addresses key challenges for info-seeking agents + is compatible with trained/generalist LLMs (both open- + closed-source) by guiding agents towards better queries/actions in long-horizon tasks (GAIA, FRAMES, WebWalkerQA), w/ strong gains (e.g. +9.3% for Qwen 32B, +~4%…

Archiki Prasad ✈️ NeurIPS 2025

@ArchikiPrasad

25 พ.ย.

Zaid Khan รีโพสต์แล้ว

Archiki Prasad ✈️ NeurIPS 2025

@ArchikiPrasad

25 พ.ย.

Zaid Khan รีโพสต์แล้ว

Yidong Huang

@owenhuang117

24 พ.ย.

🚨 Excited to share SketchVerify — a framework that scales trajectory planning for video generation. ➡️ Sketch-level motion previews let us search dozens of trajectory candidates instantly — without paying the cost of the time-consuming diffusion process. ➡️ A multimodal…

Zaid Khan รีโพสต์แล้ว

Yue Zhang

@zhan1624

19 พ.ย.

🚨 Thrilled to introduce DEER-3D: Error-Driven Scene Editing for 3D Grounding in Large Language Models - Introduces an error-driven scene editing framework to improve 3D visual grounding in 3D-LLMs. - Generates targeted 3D counterfactual edits that directly challenge the…

zhan1624's tweet image. 🚨 Thrilled to introduce DEER-3D: Error-Driven Scene Editing for 3D Grounding in Large Language Models

- Introduces an error-driven scene editing framework to improve 3D visual grounding in 3D-LLMs.

- Generates targeted 3D counterfactual edits that directly challenge the…

Zaid Khan รีโพสต์แล้ว

Alex Shaw

@alexgshaw

7 พ.ย.

Today, we’re announcing the next chapter of Terminal-Bench with two releases: 1. Harbor, a new package for running sandboxed agent rollouts at scale 2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification

alexgshaw's tweet image. Today, we’re announcing the next chapter of Terminal-Bench with two releases:

1. Harbor, a new package for running sandboxed agent rollouts at scale
2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification

Zaid Khan รีโพสต์แล้ว

Arijit Ray

@ARRay693

7 พ.ย.

We live, feel, and create by perceiving the world as visual spaces unfolding through time — videos. Our memories and even our language are spatial: mind-palaces, mind-maps, "taking steps in the right direction..." Super excited to see Cambrian-S pushing this frontier! And,…

Saining Xie

@sainingxie

7 พ.ย.

Introducing Cambrian-S it’s a position, a dataset, a benchmark, and a model but above all, it represents our first steps toward exploring spatial supersensing in video. 🧶

Zaid Khan รีโพสต์แล้ว

Ellis Brown

@_ellisbrown

7 พ.ย.

MLLMs are great at understanding videos, but struggle with spatial reasoning—like estimating distances or tracking objects across time. the bottleneck? getting precise 3D spatial annotations on real videos is expensive and error-prone. introducing SIMS-V 🤖 [1/n]

Zaid Khan รีโพสต์แล้ว

Justin Chih-Yao Chen

@cyjustinchen

5 พ.ย.

I'll be presenting ✨MAgICoRe✨ virtually tonight at 7 PM ET / 8 AM CST (Gather Session 3)! I'll discuss 3 key challenges in LLM refinement for reasoning, and how MAgICoRe tackles them jointly: 1⃣ Over-correction on easy problems 2⃣ Failure to localize & fix its own errors 3⃣…

Mohit Bansal

@mohitban47

4 พ.ย.

🚨 Check out our awesome students/postdocs' papers at #EMNLP2025 and say hi to them 👋! Also, I will give a keynote (virtually) on "Attributable, Conflict-Robust, and Multimodal Summarization with Multi-Source Retrieval" at the NewSumm workshop. -- Jaehong (in-person) finished…