virprabh's profile picture. Research Scientist at Salesforce AI. Georgia Tech PhD. Interested in all things computer vision / machine learning.

Viraj Prabhu

@virprabh

Research Scientist at Salesforce AI. Georgia Tech PhD. Interested in all things computer vision / machine learning.

고정된 트윗

Check out our latest work on building Web Agents that Learn Tools (WALT) to get more done faster! 🧵👇🏻

(Thread 1/4) Announcing WALT — Web Agents that Learn Tools 🛠️ WALT reverse-engineers existing web automations (search, comment, filter) → reusable tools that allow agents to focus on higher-level reasoning rather than choreographing clicks. This abstraction transforms the…



Viraj Prabhu 님이 재게시함

🚀 Introducing BLIP3o-NEXT from @SFResearch -- a fully open-source foundation model that unifies text-to-image generation and image editing within a single architecture. Key insights: 1️⃣ Architecture-wise: most design choices show comparable performance — what matters is…

CaimingXiong's tweet image. 🚀 Introducing BLIP3o-NEXT from @SFResearch -- a fully open-source foundation model that unifies text-to-image generation and image editing within a single architecture. Key insights:
1️⃣ Architecture-wise: most design choices show comparable performance — what matters is…

Viraj Prabhu 님이 재게시함

Browser agents — and agents in general — should learn to discover and use higher-level skills rather than executing low-level atomic actions. WALT turns unsupervised web interactions into structured, reusable skills, enabling agents to act with fewer steps and greater…

Humans don’t just use tools — we invent them. That’s the next frontier for AI agents. At @SFResearch, we’re introducing WALT (Web Agents that Learn Tools) — a framework that teaches browser agents to discover and reverse-engineer a website’s hidden functionality into reusable…

CaimingXiong's tweet image. Humans don’t just use tools — we invent them.
That’s the next frontier for AI agents.
At @SFResearch, we’re introducing WALT (Web Agents that Learn Tools) — a framework that teaches browser agents to discover and reverse-engineer a website’s hidden functionality into reusable…


Viraj Prabhu 님이 재게시함

Humans don’t just use tools — we invent them. That’s the next frontier for AI agents. At @SFResearch, we’re introducing WALT (Web Agents that Learn Tools) — a framework that teaches browser agents to discover and reverse-engineer a website’s hidden functionality into reusable…

CaimingXiong's tweet image. Humans don’t just use tools — we invent them.
That’s the next frontier for AI agents.
At @SFResearch, we’re introducing WALT (Web Agents that Learn Tools) — a framework that teaches browser agents to discover and reverse-engineer a website’s hidden functionality into reusable…

Viraj Prabhu 님이 재게시함

(3/4) Outcome: up to 30% higher success rates with 1.4x fewer steps / LLM-calls (new SoTA on VisualWebArena) 📈 Here’s another example of finding stay options on Airbnb: Baseline web agent (left), WALT agent (right).


Viraj Prabhu 님이 재게시함

(4/4) We provide a simple CLI for discovery/serving (MCP) with WALT – try it out with 🚀walt discover <your-url>; walt agent <your-task> --start-url <your-url> 📝 Paper: bit.ly/4nhJf0K 🔗 Code: bit.ly/47gMAXZ Authors: @virprabh, @yutong_dai, Matthew Fernandez,…


Viraj Prabhu 님이 재게시함

Thank you to the award committee and the broader vision community for the recognition. After all these (21!) years and so many conferences across sub-disciplines in AI, the vision community continues to feel like home. What makes this extra special is that the original VQA…

deviparikh's tweet image. Thank you to the award committee and the broader vision community for the recognition. After all these (21!) years and so many conferences across sub-disciplines in AI, the vision community continues to feel like home.

What makes this extra special is that the original VQA…

I'll be presenting this at the first poster session tomorrow (Oct 21, 11.45am, Exhibit Hall I #301) – stop by if you're attending #ICCV2025! 🏖️

virprabh's tweet image. I&apos;ll be presenting this at the first poster session tomorrow (Oct 21, 11.45am, Exhibit Hall I #301) – stop by if you&apos;re attending #ICCV2025! 🏖️

💥 Super excited to introduce our latest work on **programmatically** benchmarking vision-language models in the wild 👇



Viraj Prabhu 님이 재게시함

Thank you so much Caiming! We show that involving coding as a new type of action apart from GUI action for CUA can significantly help improve the computer-using performance while reducing the total actions for task solving. If you are interested in it, please take a look at…

🚀 Computer-using agents represent a powerful new paradigm for human-computer interaction. Over the past year, we’ve explored multiple approaches to tackle the key challenges in building robust CUA systems. 12/2024 we released Aguvis (x.com/CaimingXiong/s…) 07/2024 we released…

CaimingXiong's tweet image. 🚀 Computer-using agents represent a powerful new paradigm for human-computer interaction. Over the past year, we’ve explored multiple approaches to tackle the key challenges in building robust CUA systems.

12/2024 we released Aguvis (x.com/CaimingXiong/s…)
07/2024 we released…


Viraj Prabhu 님이 재게시함

🚀 Computer-using agents represent a powerful new paradigm for human-computer interaction. Over the past year, we’ve explored multiple approaches to tackle the key challenges in building robust CUA systems. 12/2024 we released Aguvis (x.com/CaimingXiong/s…) 07/2024 we released…

CaimingXiong's tweet image. 🚀 Computer-using agents represent a powerful new paradigm for human-computer interaction. Over the past year, we’ve explored multiple approaches to tackle the key challenges in building robust CUA systems.

12/2024 we released Aguvis (x.com/CaimingXiong/s…)
07/2024 we released…

Meet AGUVIS: A pure vision-based framework for autonomous GUI agents, operating seamlessly across web, desktop, and mobile platforms without UI code. Key Features & Contributions 🔍 Pure Vision Framework: First fully autonomous pure vision GUI agent capable of performing tasks…



Happening now in 208B, come check out the first EMACS workshop! #CVPR2025

virprabh's tweet image. Happening now in 208B, come check out the first EMACS workshop! #CVPR2025

Join us at the first-ever EMACS workshop @CVPR! 🚨 Submissions open March 5: tinyurl.com/emacs25 See you in Nashville! 🎸 #CVPR2025



Viraj Prabhu 님이 재게시함

🚨🚨 Paper submission deadline extended to May 4. Submit your work (in-progress or complete!) to the EMACS workshop @CVPR2025 in Nashville! Submission link: tinyurl.com/emacs2025submit #CVPR2025 #GenerativeAI #bias

🚀 Excited about how generative AI can power experimental (not just observational) audits of ML systems that reveal actionable insights into performance and bias? Join us at the first-ever EMACS workshop @CVPR2025 in Nashville! 🌟 Speakers & submissions: sites.google.com/view/emacs2025/



Viraj Prabhu 님이 재게시함

🚀 Excited about how generative AI can power experimental (not just observational) audits of ML systems that reveal actionable insights into performance and bias? Join us at EMACS (Experimental Model Auditing with Controllable Synthesis) workshop @CVPR! sites.google.com/view/emacs2025/


Join us at the first-ever EMACS workshop @CVPR! 🚨 Submissions open March 5: tinyurl.com/emacs25 See you in Nashville! 🎸 #CVPR2025

🚀 Excited about how generative AI can power experimental (not just observational) audits of ML systems that reveal actionable insights into performance and bias? Join us at the first-ever EMACS workshop @CVPR2025 in Nashville! 🌟 Speakers & submissions: sites.google.com/view/emacs2025/



Viraj Prabhu 님이 재게시함

Introducing Gaze-LLE, a new model for gaze target estimation built on top of a frozen visual foundation model! Gaze-LLE achieves SOTA results on multiple benchmarks while learning minimal parameters, and shows strong generalization paper: arxiv.org/abs/2412.09586


Looking forward to some Miami sun this week at #EMNLP2024, my first NLP conference in ~7 years! ☀️ HMU if you’d like to learn more about our work at @SFResearch or just meet/catch up! 🍹


Viraj Prabhu 님이 재게시함

🤔Ever wondered why merging LoRA models is trickier than fully-finetuned ones? 🔍We explore this and discover that poor alignment b/w LoRA models lead to subpar merging. 💡The solution? KnOTS🪢— our latest work that uses SVD to improve alignment and boosts SOTA merging methods.

Model merging is tricky when model weights aren’t aligned Introducing KnOTS 🪢: a gradient-free framework to merge LoRA models. KnOTS is plug-and-play, boosting SoTA merging methods by up to 4.3%🚀 📜: arxiv.org/abs/2410.19735 💻: github.com/gstoica27/KnOTS



Viraj Prabhu 님이 재게시함

Introducing EgoMimic - just wear a pair of Project Aria @meta_aria smart glasses 👓 to scale up your imitation learning datasets! Check out what our robot can do. A thread below👇


Viraj Prabhu 님이 재게시함

Evaluate the hallucination of your VLMs using our new benchmark

🚨🚨🚨Introducing PROVE: A new programmatic benchmark for evaluating vision-language models (VLMs). VLMs often provide responses that are unhelpful, contain false claims about the image, or both. However, benchmarking this in the wild can be surprisingly hard! Enter PROVE,…



Loading...

Something went wrong.


Something went wrong.