#visionlanguage resultados da pesquisa

Back from the break with Phillip Isola @phillip_isola on “On the Perceptual Distance Between Images and Text.” A fascinating and interactive look at how models (and humans!) measure similarity 👏🏻 #HiCV2025 #ICCV2025 #VisionLanguage

vcuculo's tweet image. Back from the break with Phillip Isola @phillip_isola on
“On the Perceptual Distance Between Images and Text.”
A fascinating and interactive look at how models (and humans!) measure similarity 👏🏻

#HiCV2025 #ICCV2025 #VisionLanguage
vcuculo's tweet image. Back from the break with Phillip Isola @phillip_isola on
“On the Perceptual Distance Between Images and Text.”
A fascinating and interactive look at how models (and humans!) measure similarity 👏🏻

#HiCV2025 #ICCV2025 #VisionLanguage

Day 168 Meet MiniCPM-V 2.6, the latest and most capable model in the MiniCPM-V series! This powerhouse surpasses GPT-4V in single image, multi-image, and video understanding. #AI #MachineLearning #VisionLanguage #MiniCPMV #TechInnovation #RealTimeVideo #MultimodalLLMs #GPT4V


In 2023, researchers launched VisIT-Bench, a benchmark with 592 vision-language tasks spanning 70 categories like plot analysis and art knowledge. #AI #VisionLanguage @Stanford

amperlycom's tweet image. In 2023, researchers launched VisIT-Bench, a benchmark with 592 vision-language tasks spanning 70 categories like plot analysis and art knowledge. #AI #VisionLanguage @Stanford

🚀 Thrilled to announce our paper "TG-LLaVA: Text Guided LLaVA" accepted by @AAAI! We enhance vision encoders with text guidance, boosting performance without extra data. Huge thanks to the team! #AI #VisionLanguage

AI_AlibabaInt's tweet image. 🚀 Thrilled to announce our paper "TG-LLaVA: Text Guided LLaVA" accepted by @AAAI! We enhance vision encoders with text guidance, boosting performance without extra data. Huge thanks to the team! #AI #VisionLanguage

🔥 Discover the fascinating world of Multimodal Foundation Models, with the journey "From Specialists to General-Purpose Assistants" 🌐 Dive into the evolution of large models in #ComputerVision & #VisionLanguage! Paper: arxiv.org/abs/2309.10020 Tutorial: vlp-tutorial.github.io/2023/

ChunyuanLi's tweet image. 🔥 Discover the fascinating world of Multimodal Foundation Models, with the journey "From Specialists to General-Purpose Assistants" 🌐
Dive into the evolution of large models in #ComputerVision & #VisionLanguage!

Paper: arxiv.org/abs/2309.10020
Tutorial: vlp-tutorial.github.io/2023/
ChunyuanLi's tweet image. 🔥 Discover the fascinating world of Multimodal Foundation Models, with the journey "From Specialists to General-Purpose Assistants" 🌐
Dive into the evolution of large models in #ComputerVision & #VisionLanguage!

Paper: arxiv.org/abs/2309.10020
Tutorial: vlp-tutorial.github.io/2023/
ChunyuanLi's tweet image. 🔥 Discover the fascinating world of Multimodal Foundation Models, with the journey "From Specialists to General-Purpose Assistants" 🌐
Dive into the evolution of large models in #ComputerVision & #VisionLanguage!

Paper: arxiv.org/abs/2309.10020
Tutorial: vlp-tutorial.github.io/2023/
ChunyuanLi's tweet image. 🔥 Discover the fascinating world of Multimodal Foundation Models, with the journey "From Specialists to General-Purpose Assistants" 🌐
Dive into the evolution of large models in #ComputerVision & #VisionLanguage!

Paper: arxiv.org/abs/2309.10020
Tutorial: vlp-tutorial.github.io/2023/

Multimodal Foundation Models: From Specialists to General-Purpose Assistants paper page: huggingface.co/papers/2309.10… paper presents a comprehensive survey of the taxonomy and evolution of multimodal foundation models that demonstrate vision and vision-language capabilities,…

_akhaliq's tweet image. Multimodal Foundation Models: From Specialists to General-Purpose Assistants

paper page: huggingface.co/papers/2309.10…

paper presents a comprehensive survey of the taxonomy and evolution of multimodal foundation models that demonstrate vision and vision-language capabilities,…


📢Join us for our #ImageInWords poster presentation on Nov 12th, 11:00-12:30pm at #EMNLP . We'll be diving deep into hyper-detailed image descriptions and the impact on #VisionLanguage models. See you there!👋 #NLProc #ComputerVision @emnlpmeeting @AndreaDBurns @GoogleDeepMind 🧵

roopalgarg's tweet image. 📢Join us for our #ImageInWords poster presentation on Nov 12th, 11:00-12:30pm at #EMNLP
. We'll be diving deep into hyper-detailed image descriptions and the impact on #VisionLanguage models. See you there!👋 #NLProc #ComputerVision @emnlpmeeting @AndreaDBurns @GoogleDeepMind 🧵

When you feel the Holy Spirit trying to drop a vision on you, but you can't connect... go lie down and take a nap. Your mind is in the way. Once your mind shuts down, your spirit and the Holy Spirit will talk and you'll access that vision. #VisionLanguage #PrayingProphet


🚨 New model alert! 🚨 We've added OpenGVLab InternVL3_5-2B! It's a vision-language model. Get it running in LocalAI with: `local-ai run opengvlab_internvl3_5-2b` 😉 #LocalAI #VisionLanguage #NewModel


🔍 Today's top pick from @powerdrill Research Digest: 'Training-Free Unsupervised Prompt for Vision-Language Models'. Check out the link for a summary: app.powerdrill.ai/s/1jB88R #AI #VisionLanguage #Research

powerdrillai's tweet image. 🔍 Today's top pick from @powerdrill Research Digest:  'Training-Free Unsupervised Prompt for Vision-Language Models'.  Check out the link for a summary: app.powerdrill.ai/s/1jB88R  #AI #VisionLanguage #Research

With Meta's LLama-3.2-90b Vision Instruct model, you can upload an image and engage in a fascinating conversation about it! 🎨📸 This cutting-edge vision-language model redefines how we interact with images, unlocking new possibilities. #AI #VisionLanguage #Innovation#ALX_AI

BodundeFT's tweet image. With Meta's LLama-3.2-90b Vision Instruct model, you can upload an image and engage in a fascinating conversation about it! 🎨📸 This cutting-edge vision-language model redefines how we interact with images, unlocking new possibilities. #AI #VisionLanguage #Innovation#ALX_AI
BodundeFT's tweet image. With Meta's LLama-3.2-90b Vision Instruct model, you can upload an image and engage in a fascinating conversation about it! 🎨📸 This cutting-edge vision-language model redefines how we interact with images, unlocking new possibilities. #AI #VisionLanguage #Innovation#ALX_AI
BodundeFT's tweet image. With Meta's LLama-3.2-90b Vision Instruct model, you can upload an image and engage in a fascinating conversation about it! 🎨📸 This cutting-edge vision-language model redefines how we interact with images, unlocking new possibilities. #AI #VisionLanguage #Innovation#ALX_AI

MiniGPT-4: Enhancing vision-language understanding using advanced large language models for multi-task learning. #AI #VisionLanguage #LLM #MachineLearning

essamamdani's tweet image. MiniGPT-4: Enhancing vision-language understanding using advanced large language models for multi-task learning. #AI #VisionLanguage #LLM #MachineLearning

Power up sustainably with #MiniGPT4. The smarter, smaller sibling of #GPT4 with better #visionlanguage understanding, it is perfect for image captioning and visual question-answering tasks. Discover the benefits at bit.ly/3WXNVxq #USAII #AI #SoftwareDeveloper #gpt4


1/5 🌐"On the Test-Time Zero-Shot Generalization of Vision-Language Models: Do We Really Need Prompt Learning?" This paper explores the necessity of prompt learning for VLMs. #AI #MachineLearning #VisionLanguage

AbhinavGirdhar's tweet image. 1/5 
🌐"On the Test-Time Zero-Shot Generalization of Vision-Language Models: Do We Really Need Prompt Learning?" This paper explores the necessity of prompt learning for VLMs. 
#AI #MachineLearning #VisionLanguage

🚀CFP: Vision-and-Language Intelligence Explore the frontier of multimodal AI — from image captioning & VQA to foundation models & real-world applications. 🗣 Submit your work soon! 🔗oaepublish.com/specials/ir.10… #VisionLanguage #MultimodalAI #AI #CFP #DeepLearning

OAE_IR's tweet image. 🚀CFP: Vision-and-Language Intelligence
Explore the frontier of multimodal AI — from image captioning & VQA to foundation models & real-world applications.
🗣 Submit your work soon!
🔗oaepublish.com/specials/ir.10…
#VisionLanguage #MultimodalAI #AI #CFP #DeepLearning

1/5 🖼️📚Introducing a new approach for vision-language pre-training: "Efficient Vision-Language Pre-training by Cluster Masking." This method enhances visual-language contrastive learning with a novel masking technique. #AI #MachineLearning #VisionLanguage

AbhinavGirdhar's tweet image. 1/5 
🖼️📚Introducing a new approach for vision-language pre-training: "Efficient Vision-Language Pre-training by Cluster Masking." This method enhances visual-language contrastive learning with a novel masking technique. #AI #MachineLearning #VisionLanguage

HoneyBee: A 2.5M-sample vision-language reasoning dataset boosts VL model accuracy up to +38.9%, with smarter data curation (context, CoT, scaling) and 73% lower decoding cost. #AI #VisionLanguage

New paper 📢 Most powerful vision-language (VL) reasoning datasets remain proprietary 🔒, hindering efforts to study their principles and develop similarly effective datasets in the open 🔓. Thus, we introduce HoneyBee, a 2.5M-example dataset created through careful data…

hbXNov's tweet image. New paper 📢 Most powerful vision-language (VL) reasoning datasets remain proprietary 🔒, hindering efforts to study their principles and develop similarly effective datasets in the open 🔓. 

Thus, we introduce HoneyBee, a 2.5M-example dataset created through careful data…


Back from the break with Phillip Isola @phillip_isola on “On the Perceptual Distance Between Images and Text.” A fascinating and interactive look at how models (and humans!) measure similarity 👏🏻 #HiCV2025 #ICCV2025 #VisionLanguage

vcuculo's tweet image. Back from the break with Phillip Isola @phillip_isola on
“On the Perceptual Distance Between Images and Text.”
A fascinating and interactive look at how models (and humans!) measure similarity 👏🏻

#HiCV2025 #ICCV2025 #VisionLanguage
vcuculo's tweet image. Back from the break with Phillip Isola @phillip_isola on
“On the Perceptual Distance Between Images and Text.”
A fascinating and interactive look at how models (and humans!) measure similarity 👏🏻

#HiCV2025 #ICCV2025 #VisionLanguage

Join us as we explore how to build trustworthy, robust, and safe vision-language generative models (e.g., Text-to-Image & Image-to-Text models). #ICCV2025 #GenerativeAI #VisionLanguage #AITrustworthiness #ResponsibleAI


🚀 Temporal Understanding has been a missing piece of puzzle for multimodal large language models. 🧵 0/n Proud to announce our work, TimeWarp, a novel synthetic temporal preference data generation pipeline. #VideoLLMs #VisionLanguage #Genai #LLMs #TemporalUnderstanding


The special contribution? Our dataset goes beyond perception: we curated cultural reasoning questions that require knowledge to answer, not just visual cues. Check it out here 👉 seeingculture-benchmark.github.io #EMNLP #visionlanguage #vlm #benchmark #computervision


[1/4] We tested InternVL3.5-aligned GPT-OSS on MMMU and found visual alignment gaps: misreads, shallow grounding, and repetition. It’s an early step, not the solution—good scaffolding, weak seeing. #Multimodal #VisionLanguage

johnson111788's tweet image. [1/4] We tested InternVL3.5-aligned GPT-OSS on MMMU and found visual alignment gaps: misreads, shallow grounding, and repetition. It’s an early step, not the solution—good scaffolding, weak seeing. #Multimodal #VisionLanguage

Thrilled to see ERNIE 4.5 VL earn a top spot on the latest SuperCLUE-VLM benchmark. Time to build something new with vision!👀 #AI #Multimodal #VisionLanguage #LLM #Benchmark #ERNIE

ErnieforDevs's tweet image. Thrilled to see ERNIE 4.5 VL earn a top spot on the latest SuperCLUE-VLM benchmark.
Time to build something new with vision!👀
#AI #Multimodal #VisionLanguage #LLM #Benchmark #ERNIE

Discover FastVLM, a breakthrough in Vision Language Models that boosts image resolution and speeds up processing, making text-rich image understanding more efficient! 🚀📸 #AI #VisionLanguage #TechInnovation rpst.cc/oMa5DZ

solveriaart's tweet image. Discover FastVLM, a breakthrough in Vision Language Models that boosts image resolution and speeds up processing, making text-rich image understanding more efficient! 🚀📸 #AI #VisionLanguage #TechInnovation rpst.cc/oMa5DZ

🚨 New model alert! 🚨 We've added OpenGVLab InternVL3_5-2B! It's a vision-language model. Get it running in LocalAI with: `local-ai run opengvlab_internvl3_5-2b` 😉 #LocalAI #VisionLanguage #NewModel


Great to see NVIDIA releasing a huge vision-language dataset with 3M samples! Open datasets like this will help accelerate research in OCR, VQA and captioning tasks. Excited to see what developers build with it. #AI #VisionLanguage

We just released 3 million samples of high quality vision language model training dataset for use cases such as: 📄 optical character recognition (OCR) 📊 visual question answering (VQA) 📝 captioning 🤗 Learn more: nvda.ws/4oyfevu 📥 Download: nvda.ws/4fz2gtB



MiniGPT-4: Enhancing vision-language understanding using advanced large language models for multi-task learning. #AI #VisionLanguage #LLM #MachineLearning

essamamdani's tweet image. MiniGPT-4: Enhancing vision-language understanding using advanced large language models for multi-task learning. #AI #VisionLanguage #LLM #MachineLearning

💡 Dynamic Token Compression in DeepSeek-VL: ✂️ 40% fewer tokens for high-res images 📊 +3.1% accuracy on TextVQA Code: github.com/deepseek-ai/vl… #VisionLanguage @PublicAI_ @PublicAIData @DataBabies333


🚨 @ICCVConference 2025🚨 Happy to share that our paper: Visual Modality Prompt for Adapting Vision-Language Object Detectors was accepted at #ICCV2025 📄 Paper : arxiv.org/abs/2412.00622 💻 Code: github.com/heitorrapela/M… #ComputerVision #VisionLanguage #ObjectDetection #VLMs

HeitorRapela's tweet image. 🚨 @ICCVConference 2025🚨

Happy to share that our paper: Visual Modality Prompt for Adapting Vision-Language Object Detectors was accepted at #ICCV2025 

📄 Paper : arxiv.org/abs/2412.00622
💻 Code: github.com/heitorrapela/M…

#ComputerVision #VisionLanguage #ObjectDetection #VLMs

MY FIRST STEP IS TO PLANT MY SEEDS WHEREVER I CAN PLANT THEM #ViSiONLANGUAGE LIVING 4EVA IS DA GOAL #2000NOW

FLEYEMOUTH357's tweet image. MY FIRST STEP IS TO PLANT MY SEEDS WHEREVER I CAN PLANT THEM #ViSiONLANGUAGE LIVING 4EVA IS DA GOAL #2000NOW

Discover FastVLM, a breakthrough in Vision Language Models that boosts image resolution and speeds up processing, making text-rich image understanding more efficient! 🚀📸 #AI #VisionLanguage #TechInnovation rpst.cc/oMa5DZ

solveriaart's tweet image. Discover FastVLM, a breakthrough in Vision Language Models that boosts image resolution and speeds up processing, making text-rich image understanding more efficient! 🚀📸 #AI #VisionLanguage #TechInnovation rpst.cc/oMa5DZ

📢Join us for our #ImageInWords poster presentation on Nov 12th, 11:00-12:30pm at #EMNLP . We'll be diving deep into hyper-detailed image descriptions and the impact on #VisionLanguage models. See you there!👋 #NLProc #ComputerVision @emnlpmeeting @AndreaDBurns @GoogleDeepMind 🧵

roopalgarg's tweet image. 📢Join us for our #ImageInWords poster presentation on Nov 12th, 11:00-12:30pm at #EMNLP
. We'll be diving deep into hyper-detailed image descriptions and the impact on #VisionLanguage models. See you there!👋 #NLProc #ComputerVision @emnlpmeeting @AndreaDBurns @GoogleDeepMind 🧵

Back from the break with Phillip Isola @phillip_isola on “On the Perceptual Distance Between Images and Text.” A fascinating and interactive look at how models (and humans!) measure similarity 👏🏻 #HiCV2025 #ICCV2025 #VisionLanguage

vcuculo's tweet image. Back from the break with Phillip Isola @phillip_isola on
“On the Perceptual Distance Between Images and Text.”
A fascinating and interactive look at how models (and humans!) measure similarity 👏🏻

#HiCV2025 #ICCV2025 #VisionLanguage
vcuculo's tweet image. Back from the break with Phillip Isola @phillip_isola on
“On the Perceptual Distance Between Images and Text.”
A fascinating and interactive look at how models (and humans!) measure similarity 👏🏻

#HiCV2025 #ICCV2025 #VisionLanguage

[1/4] We tested InternVL3.5-aligned GPT-OSS on MMMU and found visual alignment gaps: misreads, shallow grounding, and repetition. It’s an early step, not the solution—good scaffolding, weak seeing. #Multimodal #VisionLanguage

johnson111788's tweet image. [1/4] We tested InternVL3.5-aligned GPT-OSS on MMMU and found visual alignment gaps: misreads, shallow grounding, and repetition. It’s an early step, not the solution—good scaffolding, weak seeing. #Multimodal #VisionLanguage

In 2023, researchers launched VisIT-Bench, a benchmark with 592 vision-language tasks spanning 70 categories like plot analysis and art knowledge. #AI #VisionLanguage @Stanford

amperlycom's tweet image. In 2023, researchers launched VisIT-Bench, a benchmark with 592 vision-language tasks spanning 70 categories like plot analysis and art knowledge. #AI #VisionLanguage @Stanford

🚀 Thrilled to announce our paper "TG-LLaVA: Text Guided LLaVA" accepted by @AAAI! We enhance vision encoders with text guidance, boosting performance without extra data. Huge thanks to the team! #AI #VisionLanguage

AI_AlibabaInt's tweet image. 🚀 Thrilled to announce our paper "TG-LLaVA: Text Guided LLaVA" accepted by @AAAI! We enhance vision encoders with text guidance, boosting performance without extra data. Huge thanks to the team! #AI #VisionLanguage

Thrilled to see ERNIE 4.5 VL earn a top spot on the latest SuperCLUE-VLM benchmark. Time to build something new with vision!👀 #AI #Multimodal #VisionLanguage #LLM #Benchmark #ERNIE

ErnieforDevs's tweet image. Thrilled to see ERNIE 4.5 VL earn a top spot on the latest SuperCLUE-VLM benchmark.
Time to build something new with vision!👀
#AI #Multimodal #VisionLanguage #LLM #Benchmark #ERNIE

1/5 🖼️📚Introducing a new approach for vision-language pre-training: "Efficient Vision-Language Pre-training by Cluster Masking." This method enhances visual-language contrastive learning with a novel masking technique. #AI #MachineLearning #VisionLanguage

AbhinavGirdhar's tweet image. 1/5 
🖼️📚Introducing a new approach for vision-language pre-training: "Efficient Vision-Language Pre-training by Cluster Masking." This method enhances visual-language contrastive learning with a novel masking technique. #AI #MachineLearning #VisionLanguage

1/5 🌐"On the Test-Time Zero-Shot Generalization of Vision-Language Models: Do We Really Need Prompt Learning?" This paper explores the necessity of prompt learning for VLMs. #AI #MachineLearning #VisionLanguage

AbhinavGirdhar's tweet image. 1/5 
🌐"On the Test-Time Zero-Shot Generalization of Vision-Language Models: Do We Really Need Prompt Learning?" This paper explores the necessity of prompt learning for VLMs. 
#AI #MachineLearning #VisionLanguage

1/5 🗣️"MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs." This benchmark evaluates LVLMs in complex dialog scenarios involving multiple images. #AI #MachineLearning #VisionLanguage

AbhinavGirdhar's tweet image. 1/5 
🗣️"MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs." This benchmark evaluates LVLMs in complex dialog scenarios involving multiple images. #AI #MachineLearning #VisionLanguage

🔍 𝗪𝗵𝗮𝘁’𝘀 𝗻𝗲𝘄? We propose a new approach to fine-tuning large Vision-Language Models (VLMs) on resource-constrained clients in Federated Learning—essential for healthcare, where privacy matters most. #FederatedLearning #VisionLanguage #FoundationModels

PramitSaha5's tweet image. 🔍 𝗪𝗵𝗮𝘁’𝘀 𝗻𝗲𝘄?
We propose a new approach to fine-tuning large Vision-Language Models (VLMs) on resource-constrained clients in Federated Learning—essential for healthcare, where privacy matters most.

#FederatedLearning #VisionLanguage #FoundationModels

🚨 @ICCVConference 2025🚨 Happy to share that our paper: Visual Modality Prompt for Adapting Vision-Language Object Detectors was accepted at #ICCV2025 📄 Paper : arxiv.org/abs/2412.00622 💻 Code: github.com/heitorrapela/M… #ComputerVision #VisionLanguage #ObjectDetection #VLMs

HeitorRapela's tweet image. 🚨 @ICCVConference 2025🚨

Happy to share that our paper: Visual Modality Prompt for Adapting Vision-Language Object Detectors was accepted at #ICCV2025 

📄 Paper : arxiv.org/abs/2412.00622
💻 Code: github.com/heitorrapela/M…

#ComputerVision #VisionLanguage #ObjectDetection #VLMs

🔍 Today's top pick from @powerdrill Research Digest: 'Training-Free Unsupervised Prompt for Vision-Language Models'. Check out the link for a summary: app.powerdrill.ai/s/1jB88R #AI #VisionLanguage #Research

powerdrillai's tweet image. 🔍 Today's top pick from @powerdrill Research Digest:  'Training-Free Unsupervised Prompt for Vision-Language Models'.  Check out the link for a summary: app.powerdrill.ai/s/1jB88R  #AI #VisionLanguage #Research

🔥 Discover the fascinating world of Multimodal Foundation Models, with the journey "From Specialists to General-Purpose Assistants" 🌐 Dive into the evolution of large models in #ComputerVision & #VisionLanguage! Paper: arxiv.org/abs/2309.10020 Tutorial: vlp-tutorial.github.io/2023/

ChunyuanLi's tweet image. 🔥 Discover the fascinating world of Multimodal Foundation Models, with the journey "From Specialists to General-Purpose Assistants" 🌐
Dive into the evolution of large models in #ComputerVision & #VisionLanguage!

Paper: arxiv.org/abs/2309.10020
Tutorial: vlp-tutorial.github.io/2023/
ChunyuanLi's tweet image. 🔥 Discover the fascinating world of Multimodal Foundation Models, with the journey "From Specialists to General-Purpose Assistants" 🌐
Dive into the evolution of large models in #ComputerVision & #VisionLanguage!

Paper: arxiv.org/abs/2309.10020
Tutorial: vlp-tutorial.github.io/2023/
ChunyuanLi's tweet image. 🔥 Discover the fascinating world of Multimodal Foundation Models, with the journey "From Specialists to General-Purpose Assistants" 🌐
Dive into the evolution of large models in #ComputerVision & #VisionLanguage!

Paper: arxiv.org/abs/2309.10020
Tutorial: vlp-tutorial.github.io/2023/
ChunyuanLi's tweet image. 🔥 Discover the fascinating world of Multimodal Foundation Models, with the journey "From Specialists to General-Purpose Assistants" 🌐
Dive into the evolution of large models in #ComputerVision & #VisionLanguage!

Paper: arxiv.org/abs/2309.10020
Tutorial: vlp-tutorial.github.io/2023/

Multimodal Foundation Models: From Specialists to General-Purpose Assistants paper page: huggingface.co/papers/2309.10… paper presents a comprehensive survey of the taxonomy and evolution of multimodal foundation models that demonstrate vision and vision-language capabilities,…

_akhaliq's tweet image. Multimodal Foundation Models: From Specialists to General-Purpose Assistants

paper page: huggingface.co/papers/2309.10…

paper presents a comprehensive survey of the taxonomy and evolution of multimodal foundation models that demonstrate vision and vision-language capabilities,…


Loading...

Something went wrong.


Something went wrong.


United States Trends