Viraj Prabhu

@virprabh

Research Scientist at Salesforce AI. Georgia Tech PhD. Interested in all things computer vision / machine learning.

Seattle, WA

virajprabhu.com

2월 2012에 가입

277게시물 570팔로워 551팔로우 중

내가 좋아할 만한 콘텐츠

@judyfhoffman

@anikembhavi

@kdexd

@fdellaert

@danfei_xu

@HildeKuehne

@yash2kant

@abhshkdz

@RamRamrakhya

@erikwijmans

@unnatjain2010

@ehsanik

@ayshrv

@andrewsilva9

@PurvaTendulkar

고정된 트윗

Viraj Prabhu

@virprabh

. 10. 22.

Check out our latest work on building Web Agents that Learn Tools (WALT) to get more done faster! 🧵👇🏻

(Thread 1/4) Announcing WALT — Web Agents that Learn Tools 🛠️ WALT reverse-engineers existing web automations (search, comment, filter) → reusable tools that allow agents to focus on higher-level reasoning rather than choreographing clicks. This abstraction transforms the…

Viraj Prabhu 님이 재게시함

Caiming Xiong

@CaimingXiong

. 11. 10.

🚀 Introducing BLIP3o-NEXT from @SFResearch -- a fully open-source foundation model that unifies text-to-image generation and image editing within a single architecture. Key insights: 1️⃣ Architecture-wise: most design choices show comparable performance — what matters is…

CaimingXiong's tweet image. 🚀 Introducing BLIP3o-NEXT from @SFResearch -- a fully open-source foundation model that unifies text-to-image generation and image editing within a single architecture. Key insights:
1️⃣ Architecture-wise: most design choices show comparable performance — what matters is…

Viraj Prabhu 님이 재게시함

Li Junnan

@LiJunnan0409

. 10. 22.

Browser agents — and agents in general — should learn to discover and use higher-level skills rather than executing low-level atomic actions. WALT turns unsupervised web interactions into structured, reusable skills, enabling agents to act with fewer steps and greater…

Caiming Xiong

@CaimingXiong

. 10. 22.

Humans don’t just use tools — we invent them. That’s the next frontier for AI agents. At @SFResearch, we’re introducing WALT (Web Agents that Learn Tools) — a framework that teaches browser agents to discover and reverse-engineer a website’s hidden functionality into reusable…

CaimingXiong's tweet image. Humans don’t just use tools — we invent them.
That’s the next frontier for AI agents.
At @SFResearch, we’re introducing WALT (Web Agents that Learn Tools) — a framework that teaches browser agents to discover and reverse-engineer a website’s hidden functionality into reusable…

Viraj Prabhu 님이 재게시함

Caiming Xiong

@CaimingXiong

. 10. 22.

Viraj Prabhu 님이 재게시함

Salesforce AI Research

@SFResearch

. 10. 22.

(3/4) Outcome: up to 30% higher success rates with 1.4x fewer steps / LLM-calls (new SoTA on VisualWebArena) 📈 Here’s another example of finding stay options on Airbnb: Baseline web agent (left), WALT agent (right).

Viraj Prabhu 님이 재게시함

Salesforce AI Research

@SFResearch

. 10. 22.

(4/4) We provide a simple CLI for discovery/serving (MCP) with WALT – try it out with 🚀walt discover <your-url>; walt agent <your-task> --start-url <your-url> 📝 Paper: bit.ly/4nhJf0K 🔗 Code: bit.ly/47gMAXZ Authors: @virprabh, @yutong_dai, Matthew Fernandez,…

SFResearch's tweet card. Code for WALT – Web Agents that Learn Tools. Contribute to SalesforceAIResearch/WALT development by creating an account on GitHub.

GitHub - SalesforceAIResearch/WALT: Code for WALT – Web Agents that Learn Tools

출처: github.com

Viraj Prabhu 님이 재게시함

Devi Parikh

@deviparikh

. 10. 21.

Thank you to the award committee and the broader vision community for the recognition. After all these (21!) years and so many conferences across sub-disciplines in AI, the vision community continues to feel like home. What makes this extra special is that the original VQA…

deviparikh's tweet image. Thank you to the award committee and the broader vision community for the recognition. After all these (21!) years and so many conferences across sub-disciplines in AI, the vision community continues to feel like home.

What makes this extra special is that the original VQA…

Viraj Prabhu

@virprabh

. 10. 21.

I'll be presenting this at the first poster session tomorrow (Oct 21, 11.45am, Exhibit Hall I #301) – stop by if you're attending #ICCV2025! 🏖️

virprabh's tweet image. I'll be presenting this at the first poster session tomorrow (Oct 21, 11.45am, Exhibit Hall I #301) – stop by if you're attending #ICCV2025! 🏖️

Viraj Prabhu

@virprabh

2024. 10. 24.

💥 Super excited to introduce our latest work on **programmatically** benchmarking vision-language models in the wild 👇

Viraj Prabhu 님이 재게시함

Linxin Song

@linxins2

. 8. 7.

Thank you so much Caiming! We show that involving coding as a new type of action apart from GUI action for CUA can significantly help improve the computer-using performance while reducing the total actions for task solving. If you are interested in it, please take a look at…

Caiming Xiong

@CaimingXiong

. 8. 7.

🚀 Computer-using agents represent a powerful new paradigm for human-computer interaction. Over the past year, we’ve explored multiple approaches to tackle the key challenges in building robust CUA systems. 12/2024 we released Aguvis (x.com/CaimingXiong/s…) 07/2024 we released…

CaimingXiong's tweet image. 🚀 Computer-using agents represent a powerful new paradigm for human-computer interaction. Over the past year, we’ve explored multiple approaches to tackle the key challenges in building robust CUA systems.

12/2024 we released Aguvis (x.com/CaimingXiong/s…)
07/2024 we released…

Viraj Prabhu 님이 재게시함

Caiming Xiong

@CaimingXiong

. 8. 7.

Caiming Xiong

@CaimingXiong

. 12. 24.

Meet AGUVIS: A pure vision-based framework for autonomous GUI agents, operating seamlessly across web, desktop, and mobile platforms without UI code. Key Features & Contributions 🔍 Pure Vision Framework: First fully autonomous pure vision GUI agent capable of performing tasks…

Viraj Prabhu

@virprabh

. 6. 12.

Happening now in 208B, come check out the first EMACS workshop! #CVPR2025

Viraj Prabhu

@virprabh

. 3. 3.

Join us at the first-ever EMACS workshop @CVPR! 🚨 Submissions open March 5: tinyurl.com/emacs25 See you in Nashville! 🎸 #CVPR2025

Viraj Prabhu 님이 재게시함

Experimental Model Auditing Workshop @CVPR2025

@emacscvpr25

. 4. 14.

🚨🚨 Paper submission deadline extended to May 4. Submit your work (in-progress or complete!) to the EMACS workshop @CVPR2025 in Nashville! Submission link: tinyurl.com/emacs2025submit #CVPR2025 #GenerativeAI #bias

Experimental Model Auditing Workshop @CVPR2025

@emacscvpr25

. 3. 3.

🚀 Excited about how generative AI can power experimental (not just observational) audits of ML systems that reveal actionable insights into performance and bias? Join us at the first-ever EMACS workshop @CVPR2025 in Nashville! 🌟 Speakers & submissions: sites.google.com/view/emacs2025/

Viraj Prabhu 님이 재게시함

Judy Hoffman

@judyfhoffman

. 3. 27.

🚀 Excited about how generative AI can power experimental (not just observational) audits of ML systems that reveal actionable insights into performance and bias? Join us at EMACS (Experimental Model Auditing with Controllable Synthesis) workshop @CVPR! sites.google.com/view/emacs2025/

Viraj Prabhu

@virprabh

. 3. 3.

Join us at the first-ever EMACS workshop @CVPR! 🚨 Submissions open March 5: tinyurl.com/emacs25 See you in Nashville! 🎸 #CVPR2025

Experimental Model Auditing Workshop @CVPR2025

@emacscvpr25

. 3. 3.

Viraj Prabhu 님이 재게시함

Fiona Ryan

@fionakryan

. 12. 13.

Introducing Gaze-LLE, a new model for gaze target estimation built on top of a frozen visual foundation model! Gaze-LLE achieves SOTA results on multiple benchmarks while learning minimal parameters, and shows strong generalization paper: arxiv.org/abs/2412.09586

Viraj Prabhu

@virprabh

2024. 11. 11.

Looking forward to some Miami sun this week at #EMNLP2024, my first NLP conference in ~7 years! ☀️ HMU if you’d like to learn more about our work at @SFResearch or just meet/catch up! 🍹

Viraj Prabhu 님이 재게시함

Pratik Ramesh

@pratikramesh7

2024. 11. 8.

🤔Ever wondered why merging LoRA models is trickier than fully-finetuned ones? 🔍We explore this and discover that poor alignment b/w LoRA models lead to subpar merging. 💡The solution? KnOTS🪢— our latest work that uses SVD to improve alignment and boosts SOTA merging methods.

Leshem (Legend) Choshen 🤖🤗 @NeurIPS

@LChoshen

2024. 11. 8.

Model merging is tricky when model weights aren’t aligned Introducing KnOTS 🪢: a gradient-free framework to merge LoRA models. KnOTS is plug-and-play, boosting SoTA merging methods by up to 4.3%🚀 📜: arxiv.org/abs/2410.19735 💻: github.com/gstoica27/KnOTS

Viraj Prabhu 님이 재게시함

Simar Kareer

@simar_kareer

2024. 11. 1.

Introducing EgoMimic - just wear a pair of Project Aria @meta_aria smart glasses 👓 to scale up your imitation learning datasets! Check out what our robot can do. A thread below👇

Viraj Prabhu 님이 재게시함

Senthil Purushwalkam

@purushwalkam

2024. 10. 24.

Evaluate the hallucination of your VLMs using our new benchmark

Salesforce AI Research

@SFResearch

2024. 10. 24.

🚨🚨🚨Introducing PROVE: A new programmatic benchmark for evaluating vision-language models (VLMs). VLMs often provide responses that are unhelpful, contain false claims about the image, or both. However, benchmarking this in the wild can be surprisingly hard! Enter PROVE,…