yong_jae_lee's profile picture. Professor, Computer Sciences, UW-Madison.  I am a computer vision and machine learning researcher.

Yong Jae Lee

@yong_jae_lee

Professor, Computer Sciences, UW-Madison. I am a computer vision and machine learning researcher.

Super excited for Xueyan @xyz2maureen as she begins her faculty career and builds her lab at Tsinghua! A phenomenal opportunity to work with a brilliant and deeply thoughtful researcher.

I will join Tsinghua University, College of AI, as an Assistant Professor in the coming month. I am actively looking for 2026 spring interns and future PhDs (ping me if you are in #NeurIPS). It has been an incredible journey of 10 years since I attended an activity organized by…

xyz2maureen's tweet image. I will join Tsinghua University, College of AI, as an Assistant Professor in the coming month. I am actively looking for 2026 spring interns and future PhDs (ping me if you are in #NeurIPS).

It has been an incredible journey of 10 years since I attended an activity organized by…


Yong Jae Lee أعاد

As an AI researcher, are you interested in tracking trends from CV/NLP/ML to robotics—even Nature/Science. Our paper “Real Deep Research for AI, Robotics & Beyond” automates survey generation and trend/topic discovery across fields 🔥Explore RDR at realdeepresearch.github.io


Yong Jae Lee أعاد

🚀 Excited to present LLaVA-PruMerge this morning at ICCV(can you imagine, finally!). LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models arxiv.org/abs/2403.15388 Exhibit Hall I #287, 11:15-1:15am Work done with amazing @yuzhang_shang, @yong_jae_lee etal

MuCai7's tweet image. 🚀 Excited to present LLaVA-PruMerge this morning at ICCV(can you imagine, finally!).
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
arxiv.org/abs/2403.15388
Exhibit Hall I #287, 11:15-1:15am
Work done with amazing @yuzhang_shang, @yong_jae_lee etal

Yong Jae Lee أعاد

Presenting CuRe at the #ICCV2025 main conference from 230-4 PM at Ex Hall II #76 Drop by if you’re around!

Training text-to-image models? Want your models to represent cultures across the globe but don't know how to systematically evaluate them? Introducing ⚕️CuRe⚕️ a new benchmark and scoring suite for cultural representativeness through the lens of information gain (1/10)

wregss's tweet image. Training text-to-image models?

Want your models to represent cultures across the globe but don't know how to systematically evaluate them?

Introducing ⚕️CuRe⚕️ a new benchmark and scoring suite for cultural representativeness through the lens of information gain

(1/10)


Here is the final decision for one of our NeurIPS D&B ACs-accepted-but-PCs-rejected papers, with the vague message mentioning some kind of ranking. Why was the ranking necessary? Venue capacity? If so, this sets a concerning precedent. @NeurIPSConf

yong_jae_lee's tweet image. Here is the final decision for one of our NeurIPS D&B ACs-accepted-but-PCs-rejected papers, with the vague message mentioning some kind of ranking. Why was the ranking necessary? Venue capacity? If so, this sets a concerning precedent. @NeurIPSConf

I have two D&B papers in the same situation: ACs recommended accept, but PCs overruled and rejected with the same exact vague reason that you got. They should at least provide a proper reason.



Yong Jae Lee أعاد

My students called the new CDIS building “state-of-the-art”. I thought they were exaggerating. Today I moved in and saw it for myself. Wow. Photos cannot capture the beauty of the design.

SharonYixuanLi's tweet image. My students called the new CDIS building “state-of-the-art”. I thought they were exaggerating.

Today I moved in and saw it for myself. Wow. Photos cannot capture the beauty of the design.
SharonYixuanLi's tweet image. My students called the new CDIS building “state-of-the-art”. I thought they were exaggerating.

Today I moved in and saw it for myself. Wow. Photos cannot capture the beauty of the design.
SharonYixuanLi's tweet image. My students called the new CDIS building “state-of-the-art”. I thought they were exaggerating.

Today I moved in and saw it for myself. Wow. Photos cannot capture the beauty of the design.
SharonYixuanLi's tweet image. My students called the new CDIS building “state-of-the-art”. I thought they were exaggerating.

Today I moved in and saw it for myself. Wow. Photos cannot capture the beauty of the design.

Yong Jae Lee أعاد

#ICCV2025 Introducing X-Fusion: Introducing New Modality to Frozen Large Language Models It is a novel framework that adapts pretrained LLMs (e.g., LLaMA) to new modalities (e.g., vision) while retaining their language capabilities and world knowledge! (1/n) Project Page:…


Yong Jae Lee أعاد

LLaVA-Prumerge, the first work of Visual Token Reduction for MLLM, finally got accepted after being cited 146 times since last year. Congrats to the team! @yuzhang_shang @yong_jae_lee See how to do MLLM inference much cheaper while holding performance. llava-prumerge.github.io

MuCai7's tweet image. LLaVA-Prumerge, the first work of Visual Token Reduction for MLLM,  finally got accepted after being cited 146 times since last year.  
Congrats to the team! @yuzhang_shang @yong_jae_lee
See how to do MLLM inference much cheaper while holding  performance. llava-prumerge.github.io

visual tokens in current large multimodal models are spatially redundant, indicated by the sparse attention maps. LLaVA-PruMerge proposes to first prune and then merge visual tokens, which can compress the visual tokens by 18 times (14 times on MME/TextVQA) on average while…

ai_bites's tweet image. visual tokens in current large multimodal models are spatially redundant, indicated by the sparse attention maps.
LLaVA-PruMerge proposes to first prune and then merge visual tokens, which can compress the visual tokens by 18 times (14 times on MME/TextVQA) on average while…


Yong Jae Lee أعاد

Training text-to-image models? Want your models to represent cultures across the globe but don't know how to systematically evaluate them? Introducing ⚕️CuRe⚕️ a new benchmark and scoring suite for cultural representativeness through the lens of information gain (1/10)

wregss's tweet image. Training text-to-image models?

Want your models to represent cultures across the globe but don't know how to systematically evaluate them?

Introducing ⚕️CuRe⚕️ a new benchmark and scoring suite for cultural representativeness through the lens of information gain

(1/10)

Thank you @_akhaliq for sharing our work!

VisualToolAgent (VisTA) A Reinforcement Learning Framework for Visual Tool Selection

_akhaliq's tweet image. VisualToolAgent (VisTA)

A Reinforcement Learning Framework for Visual Tool Selection


Congratulations Dr. Mu Cai @MuCai7! Mu is my 8th PhD student and first to start in my group at UW–Madison after my move a few years ago. He made a number of important contributions in multimodal models during his PhD, and recently joined Google DeepMind. I will miss you a lot Mu!

yong_jae_lee's tweet image. Congratulations Dr. Mu Cai @MuCai7! Mu is my 8th PhD student and first to start in my group at UW–Madison after my move a few years ago. He made a number of important contributions in multimodal models during his PhD, and recently joined Google DeepMind. I will miss you a lot Mu!

Yong Jae Lee أعاد

🚀 Excited to announce our 4th Workshop on Computer Vision in the Wild (CVinW) at @CVPR 2025! 🔗 computer-vision-in-the-wild.github.io/cvpr-2025/ ⭐We have invinted a great lineup of speakers: Prof. Kaiming He, Prof. @BoqingGo, Prof. @CordeliaSchmid, Prof. @RanjayKrishna, Prof. @sainingxie, Prof.…

jw2yang4ai's tweet image. 🚀 Excited to announce our 4th Workshop on Computer Vision in the Wild (CVinW) at @CVPR 2025!
🔗 computer-vision-in-the-wild.github.io/cvpr-2025/

⭐We have invinted a great lineup of speakers: Prof. Kaiming He, Prof. @BoqingGo, Prof. @CordeliaSchmid, Prof. @RanjayKrishna, Prof. @sainingxie, Prof.…

Yong Jae Lee أعاد

Public service announcement: Multimodal LLMs are really bad at understanding images with *precision*. x.com/lukeprog/statu… A thread🧵: 1/13.

Tyler Cowen: "I've seen enough, I'm calling it, o3 is AGI" Meanwhile, o3 in response to the first prompt I give it:

lukeprog's tweet image. Tyler Cowen: "I've seen enough, I'm calling it, o3 is AGI"

Meanwhile, o3 in response to the first prompt I give it:


Congratulations again @MuCai7!! So well deserved. I will miss having you in the lab.

I am thrilled to join @GoogleDeepMind as a Research Scientist and continue working on multimodal research!

MuCai7's tweet image. I am thrilled to join  @GoogleDeepMind as a Research Scientist and continue working on multimodal research!


Loading...

Something went wrong.


Something went wrong.