Yeda Song

@runamu

Multimodal Agents for the Real World: GUI Agents, VLM, and RL @ UMich 🇺🇸

Ann Arbor, Michigan, USA

yedasong.com

Joined January 2022

20Posts 182Followers 225Following

You might like

@ModhiSpikes

Pinned

Yeda Song

@__runamu__

May 27

🔥 GUI agents struggle with real-world mobile tasks. We present MONDAY—a diverse, large-scale dataset built via an automatic pipeline that transforms internet videos into GUI agent data. ✅ VLMs trained on MONDAY show strong generalization ✅ Open data (313K steps) (1/7) 🧵 #CVPR

__runamu__'s tweet image. 🔥 GUI agents struggle with real-world mobile tasks.
We present MONDAY—a diverse, large-scale dataset built via an automatic pipeline that transforms internet videos into GUI agent data.
✅ VLMs trained on MONDAY show strong generalization
✅ Open data (313K steps) (1/7) 🧵
#CVPR

Yeda Song reposted

Aviral Kumar

@aviral_kumar2

Sep 9

🚨🚨New paper on core RL: a way to train value-functions via flow-matching for scaling compute! No text/images, but a flow directly on a scalar Q-value. This unlocks benefits of iterative compute, test-time scaling for value prediction & SOTA results on whatever we tried. 🧵⬇️

aviral_kumar2's tweet image. 🚨🚨New paper on core RL: a way to train value-functions via flow-matching for scaling compute!

No text/images, but a flow directly on a scalar Q-value. This unlocks benefits of iterative compute, test-time scaling for value prediction &amp; SOTA results on whatever we tried.

🧵⬇️

Yeda Song reposted

Seohong Park

@seohong_park

Jul 15

Flow Q-learning (FQL) is a simple method to train/fine-tune an expressive flow policy with RL. Come visit our poster at 4:30p-7p this Wed (evening session, 2nd day)!

Seohong Park

@seohong_park

Feb 5

Excited to introduce flow Q-learning (FQL)! Flow Q-learning is a *simple* and scalable data-driven RL method that trains an expressive policy with flow matching. Paper: arxiv.org/abs/2502.02538 Project page: seohong.me/projects/fql/ Thread ↓

Yeda Song

@__runamu__

Jul 12

✨Two life updates✨ 1. Started my internship at @LG_AI_Research in Ann Arbor, Michigan — Advancing AI for a better life! 🔮 2. Advanced to PhD candidacy at UMich CSE. This means I’ve completed my coursework and passed the qualification process. 🙌

Yeda Song reposted

Andrej Karpathy

@karpathy

Jun 27

The race for LLM "cognitive core" - a few billion param model that maximally sacrifices encyclopedic knowledge for capability. It lives always-on and by default on every computer as the kernel of LLM personal computing. Its features are slowly crystalizing: - Natively multimodal…

Omar Sanseviero

@osanseviero

Jun 26

I’m so excited to announce Gemma 3n is here! 🎉 🔊Multimodal (text/audio/image/video) understanding 🤯Runs with as little as 2GB of RAM 🏆First model under 10B with @lmarena_ai score of 1300+ Available now on @huggingface, @kaggle, llama.cpp, ai.dev, and more

osanseviero's tweet image. I’m so excited to announce Gemma 3n is here! 🎉

🔊Multimodal (text/audio/image/video) understanding
🤯Runs with as little as 2GB of RAM
🏆First model under 10B with @lmarena_ai score of 1300+

Available now on @huggingface, @kaggle, llama.cpp, ai.dev, and more

Yeda Song reposted

Sangwoo Mo

@sangwoomo

Jun 16

Can scaling data and models alone solve computer vision? 🤔 Join us at the SP4V Workshop at #ICCV2025 in Hawaii to explore this question! 🎤 Speakers: @danfei_xu, @joaocarreira, @jiajunwu_cs, Kristen Grauman, @sainingxie, @vincesitzmann 🔗 sp4v.github.io

sangwoomo's tweet image. Can scaling data and models alone solve computer vision? 🤔
Join us at the SP4V Workshop at #ICCV2025 in Hawaii to explore this question!

🎤 Speakers: @danfei_xu, @joaocarreira, @jiajunwu_cs, Kristen Grauman, @sainingxie, @vincesitzmann

🔗 sp4v.github.io

Yeda Song reposted

MichiganAI

@michigan_AI

Jun 10

We're heading to #CVPR2025! 📰Curious about what’s coming? Take a look at our list of accepted papers and come to meet the authors! Get ready for innovative #AI research and fresh insights!

michigan_AI's tweet image. We're heading to #CVPR2025!
📰Curious about what’s coming? Take a look at our list of accepted papers and come to meet the authors!

Get ready for innovative #AI research and fresh insights!

Yeda Song reposted

Furong Huang

@furongh

Jun 10

Excited to speak at the Workshop on Computer Vision in the Wild @CVPR 2025! 🎥🌍 🗓️ June 11 | 📍 Room 101 B, Music City Center, Nashville, TN 🎸 🧠 Talk: From Perception to Action: Building World Models for Generalist Agents Let’s connect if you're around! #CVPR2025 #robotics…

furongh's tweet image. Excited to speak at the Workshop on Computer Vision in the Wild @CVPR 2025! 🎥🌍
🗓️ June 11 | 📍 Room 101 B, Music City Center, Nashville, TN 🎸
🧠 Talk: From Perception to Action: Building World Models for Generalist Agents
Let’s connect if you're around! #CVPR2025 #robotics…

Yeda Song reposted

Jianwei Yang

@jw2yang4ai

Apr 28

🚀 Excited to announce our 4th Workshop on Computer Vision in the Wild (CVinW) at @CVPR 2025! 🔗 computer-vision-in-the-wild.github.io/cvpr-2025/ ⭐We have invinted a great lineup of speakers: Prof. Kaiming He, Prof. @BoqingGo, Prof. @CordeliaSchmid, Prof. @RanjayKrishna, Prof. @sainingxie, Prof.…

jw2yang4ai's tweet image. 🚀 Excited to announce our 4th Workshop on Computer Vision in the Wild (CVinW) at @CVPR 2025!
🔗 computer-vision-in-the-wild.github.io/cvpr-2025/

⭐We have invinted a great lineup of speakers: Prof. Kaiming He, Prof. @BoqingGo, Prof. @CordeliaSchmid, Prof. @RanjayKrishna, Prof. @sainingxie, Prof.…

Yeda Song

@__runamu__

Jun 10

Arrived in Nashville for #CVPR 🤠 Excited to present MONDAY, a collaboration with @LG_AI_Research! 📍 MMFM Workshop - Thu, 9:40 AM 📍 Main Conference - Fri, 4:00 PM Let’s connect and chat!🤝 Also exploring Summer 2026 internships 🔍 MONDAY website: monday-dataset.github.io

Yeda Song reposted

Shunyu Yao

@ShunyuYao12

Apr 14

I finally wrote another blogpost: ysymyth.github.io/The-Second-Hal… AI just keeps getting better over time, but NOW is a special moment that i call “the halftime”. Before it, training > eval. After it, eval > training. The reason: RL finally works. Lmk ur feedback so I’ll polish it.

Yeda Song reposted

Kenneth Li

@ke_li_2021

Sep 9, 2024

LLM chatbots are moving fast, but how do we make them better? In my new blog at The Gradient, I argue that an important next step is giving them a sense of "purpose."

ke_li_2021's tweet image. LLM chatbots are moving fast, but how do we make them better? In my new blog at The Gradient, I argue that an important next step is giving them a sense of "purpose."

Yeda Song reposted

Rada Mihalcea

@radamihalcea

Sep 6, 2024

I love our Michigan AI Lab @michigan_AI! A group of people who not only does some of the coolest research in AI, but also care for and of each other, and enjoy each other’s company. A picture from this week’s fun picnic. ❤️