#visionlanguage resultados da pesquisa

ViSiONLANGUAGE

21 de mar. de 2016

TIMING MUST BE for it to last forever #ViSiONLANGUAGE #ObedienceOverSacrifice

Back from the break with Phillip Isola @phillip_isola on “On the Perceptual Distance Between Images and Text.” A fascinating and interactive look at how models (and humans!) measure similarity 👏🏻 #HiCV2025 #ICCV2025 #VisionLanguage

vcuculo's tweet image. Back from the break with Phillip Isola @phillip_isola on
“On the Perceptual Distance Between Images and Text.”
A fascinating and interactive look at how models (and humans!) measure similarity 👏🏻

#HiCV2025 #ICCV2025 #VisionLanguage

ViSiONLANGUAGE

@FLEYEMOUTH357

22 de mar. de 2016

You see me because eye #StayLit #NeverNeglect #ViSiONLANGUAGE 👤💡

NITISH JAIN

@MastersNitish

13 de jan. de

Day 168 Meet MiniCPM-V 2.6, the latest and most capable model in the MiniCPM-V series! This powerhouse surpasses GPT-4V in single image, multi-image, and video understanding. #AI #MachineLearning #VisionLanguage #MiniCPMV #TechInnovation #RealTimeVideo #MultimodalLLMs #GPT4V

Ahmed Fessi

@ahmedfessi

23 de out. de

Encord’s EMM1 tests AI on real-world vision and language tasks. It reveals strengths and gaps in context understanding while pushing models to generalize better. Complexity and resources remain a challenge. #AI #Data #VisionLanguage encord.com/multimodal-dat…

encord.com

E-MM1 Dataset: The World's Largest Multimodal AI Dataset

The E-MM1 dataset is the world's largest multimodal AI dataset, with more than 100 million groups of data in five modalities to foster the development of models that fuse multiple modalities.

Fonte: encord.com

Priit @ Amperly AI productivity

@amperlycom

21 de dez. de

In 2023, researchers launched VisIT-Bench, a benchmark with 592 vision-language tasks spanning 70 categories like plot analysis and art knowledge. #AI #VisionLanguage @Stanford

amperlycom's tweet image. In 2023, researchers launched VisIT-Bench, a benchmark with 592 vision-language tasks spanning 70 categories like plot analysis and art knowledge. #AI #VisionLanguage @Stanford

Global (Glcnd) Command

@GlobalCmd

27 de set. de

❓Docs read by models—but can they prove it? // RAD² X ensures traceable, agency-first choices in VLM pipelines. glcnd.io/transforming-d… #GLCND #RAD2X #VisionLanguage #ExplainableAI

glcnd.io

Transforming Document Processing: How Vision-Language Models are Changing the Game - GLCND.IO

Vision-language models (VLMs) efficiently process documents, extracting insights from text and visuals, transforming archives into structured data.

Fonte: glcnd.io

AI at Alibaba International

@AI_AlibabaInt

20 de dez. de

🚀 Thrilled to announce our paper "TG-LLaVA: Text Guided LLaVA" accepted by @AAAI! We enhance vision encoders with text guidance, boosting performance without extra data. Huge thanks to the team! #AI #VisionLanguage

AI_AlibabaInt's tweet image. 🚀 Thrilled to announce our paper "TG-LLaVA: Text Guided LLaVA" accepted by @AAAI! We enhance vision encoders with text guidance, boosting performance without extra data. Huge thanks to the team! #AI #VisionLanguage

Chunyuan Li

@ChunyuanLi

20 de set. de 2023

🔥 Discover the fascinating world of Multimodal Foundation Models, with the journey "From Specialists to General-Purpose Assistants" 🌐 Dive into the evolution of large models in #ComputerVision & #VisionLanguage! Paper: arxiv.org/abs/2309.10020 Tutorial: vlp-tutorial.github.io/2023/

ChunyuanLi's tweet image. 🔥 Discover the fascinating world of Multimodal Foundation Models, with the journey "From Specialists to General-Purpose Assistants" 🌐
Dive into the evolution of large models in #ComputerVision &amp; #VisionLanguage!

Paper: arxiv.org/abs/2309.10020
Tutorial: vlp-tutorial.github.io/2023/

AK

@_akhaliq

20 de set. de 2023

Multimodal Foundation Models: From Specialists to General-Purpose Assistants paper page: huggingface.co/papers/2309.10… paper presents a comprehensive survey of the taxonomy and evolution of multimodal foundation models that demonstrate vision and vision-language capabilities,…

_akhaliq's tweet image. Multimodal Foundation Models: From Specialists to General-Purpose Assistants

paper page: huggingface.co/papers/2309.10…

paper presents a comprehensive survey of the taxonomy and evolution of multimodal foundation models that demonstrate vision and vision-language capabilities,…

Roopal Garg

@roopalgarg

8 de nov. de

📢Join us for our #ImageInWords poster presentation on Nov 12th, 11:00-12:30pm at #EMNLP . We'll be diving deep into hyper-detailed image descriptions and the impact on #VisionLanguage models. See you there!👋 #NLProc #ComputerVision @emnlpmeeting @AndreaDBurns @GoogleDeepMind 🧵

roopalgarg's tweet image. 📢Join us for our #ImageInWords poster presentation on Nov 12th, 11:00-12:30pm at #EMNLP
. We'll be diving deep into hyper-detailed image descriptions and the impact on #VisionLanguage models. See you there!👋 #NLProc #ComputerVision @emnlpmeeting @AndreaDBurns @GoogleDeepMind 🧵

Jennifer Eivaz

@prayingprophet

15 de abr. de 2018

When you feel the Holy Spirit trying to drop a vision on you, but you can't connect... go lie down and take a nap. Your mind is in the way. Once your mind shuts down, your spirit and the Holy Spirit will talk and you'll access that vision. #VisionLanguage #PrayingProphet

LocalAI

@LocalAI_API

26 de ago. de

🚨 New model alert! 🚨 We've added OpenGVLab InternVL3_5-2B! It's a vision-language model. Get it running in LocalAI with: `local-ai run opengvlab_internvl3_5-2b` 😉 #LocalAI #VisionLanguage #NewModel

powerdrill_ai

@powerdrillai

28 de abr. de 2024

🔍 Today's top pick from @powerdrill Research Digest: 'Training-Free Unsupervised Prompt for Vision-Language Models'. Check out the link for a summary: app.powerdrill.ai/s/1jB88R #AI #VisionLanguage #Research

powerdrillai's tweet image. 🔍 Today's top pick from @powerdrill Research Digest: 'Training-Free Unsupervised Prompt for Vision-Language Models'. Check out the link for a summary: app.powerdrill.ai/s/1jB88R #AI #VisionLanguage #Research

Bodunde F Temitope

@BodundeFT

5 de mar. de

With Meta's LLama-3.2-90b Vision Instruct model, you can upload an image and engage in a fascinating conversation about it! 🎨📸 This cutting-edge vision-language model redefines how we interact with images, unlocking new possibilities. #AI #VisionLanguage #Innovation#ALX_AI

BodundeFT's tweet image. With Meta's LLama-3.2-90b Vision Instruct model, you can upload an image and engage in a fascinating conversation about it! 🎨📸 This cutting-edge vision-language model redefines how we interact with images, unlocking new possibilities. #AI #VisionLanguage #Innovation#ALX_AI

Essa Mamdani

@essamamdani

11 de ago. de

MiniGPT-4: Enhancing vision-language understanding using advanced large language models for multi-task learning. #AI #VisionLanguage #LLM #MachineLearning

essamamdani's tweet image. MiniGPT-4: Enhancing vision-language understanding using advanced large language models for multi-task learning. #AI #VisionLanguage #LLM #MachineLearning

United States Artificial Intelligence Institute

@usaiinstitute

17 de jul. de 2023

Power up sustainably with #MiniGPT4. The smarter, smaller sibling of #GPT4 with better #visionlanguage understanding, it is perfect for image captioning and visual question-answering tasks. Discover the benefits at bit.ly/3WXNVxq #USAII #AI #SoftwareDeveloper #gpt4

Abhinav Girdhar

@AbhinavGirdhar

19 de jun. de 2024

1/5 🌐"On the Test-Time Zero-Shot Generalization of Vision-Language Models: Do We Really Need Prompt Learning?" This paper explores the necessity of prompt learning for VLMs. #AI #MachineLearning #VisionLanguage

AbhinavGirdhar's tweet image. 1/5
🌐"On the Test-Time Zero-Shot Generalization of Vision-Language Models: Do We Really Need Prompt Learning?" This paper explores the necessity of prompt learning for VLMs.
#AI #MachineLearning #VisionLanguage

Intelligence & Robotics

@OAE_IR

25 de jun. de

🚀CFP: Vision-and-Language Intelligence Explore the frontier of multimodal AI — from image captioning & VQA to foundation models & real-world applications. 🗣 Submit your work soon! 🔗oaepublish.com/specials/ir.10… #VisionLanguage #MultimodalAI #AI #CFP #DeepLearning

OAE_IR's tweet image. 🚀CFP: Vision-and-Language Intelligence
Explore the frontier of multimodal AI — from image captioning &amp; VQA to foundation models &amp; real-world applications.
🗣 Submit your work soon!
🔗oaepublish.com/specials/ir.10…
#VisionLanguage #MultimodalAI #AI #CFP #DeepLearning

Abhinav Girdhar

@AbhinavGirdhar

6 de jun. de 2024

1/5 🖼️📚Introducing a new approach for vision-language pre-training: "Efficient Vision-Language Pre-training by Cluster Masking." This method enhances visual-language contrastive learning with a novel masking technique. #AI #MachineLearning #VisionLanguage

AbhinavGirdhar's tweet image. 1/5
🖼️📚Introducing a new approach for vision-language pre-training: "Efficient Vision-Language Pre-training by Cluster Masking." This method enhances visual-language contrastive learning with a novel masking technique. #AI #MachineLearning #VisionLanguage

SUN YOUNG HWANG ᯅ 🇰🇷

@SOSOHAJALAB

2 de nov. de

HoneyBee: A 2.5M-sample vision-language reasoning dataset boosts VL model accuracy up to +38.9%, with smarter data curation (context, CoT, scaling) and 73% lower decoding cost. #AI #VisionLanguage

Hritik Bansal

@hbXNov

16 de out. de

New paper 📢 Most powerful vision-language (VL) reasoning datasets remain proprietary 🔒, hindering efforts to study their principles and develop similarly effective datasets in the open 🔓. Thus, we introduce HoneyBee, a 2.5M-example dataset created through careful data…

hbXNov's tweet image. New paper 📢 Most powerful vision-language (VL) reasoning datasets remain proprietary 🔒, hindering efforts to study their principles and develop similarly effective datasets in the open 🔓.

Thus, we introduce HoneyBee, a 2.5M-example dataset created through careful data…

Ahmed Fessi

@ahmedfessi

23 de out. de

encord.com

E-MM1 Dataset: The World's Largest Multimodal AI Dataset

The E-MM1 dataset is the world's largest multimodal AI dataset, with more than 100 million groups of data in five modalities to foster the development of models that fuse multiple modalities.

Fonte: encord.com

Vittorio Cuculo

@vcuculo

20 de out. de

Changhoon Kim

@ChanghoonKim3

19 de out. de

Join us as we explore how to build trustworthy, robust, and safe vision-language generative models (e.g., Text-to-Image & Image-to-Text models). #ICCV2025 #GenerativeAI #VisionLanguage #AITrustworthiness #ResponsibleAI

Sameep Vani

@SameepVani

7 de out. de

🚀 Temporal Understanding has been a missing piece of puzzle for multimodal large language models. 🧵 0/n Proud to announce our work, TimeWarp, a novel synthetic temporal preference data generation pipeline. #VideoLLMs #VisionLanguage #Genai #LLMs #TemporalUnderstanding

pat

@patrickamadeus_

1 de out. de

The special contribution? Our dataset goes beyond perception: we curated cultural reasoning questions that require knowledge to answer, not just visual cues. Check it out here 👉 seeingculture-benchmark.github.io #EMNLP #visionlanguage #vlm #benchmark #computervision

Yu-Cheng Chou

@johnson111788

29 de set. de

[1/4] We tested InternVL3.5-aligned GPT-OSS on MMMU and found visual alignment gaps: misreads, shallow grounding, and repetition. It’s an early step, not the solution—good scaffolding, weak seeing. #Multimodal #VisionLanguage

johnson111788's tweet image. [1/4] We tested InternVL3.5-aligned GPT-OSS on MMMU and found visual alignment gaps: misreads, shallow grounding, and repetition. It’s an early step, not the solution—good scaffolding, weak seeing. #Multimodal #VisionLanguage

Global (Glcnd) Command

@GlobalCmd

27 de set. de

❓Docs read by models—but can they prove it? // RAD² X ensures traceable, agency-first choices in VLM pipelines. glcnd.io/transforming-d… #GLCND #RAD2X #VisionLanguage #ExplainableAI

glcnd.io

Transforming Document Processing: How Vision-Language Models are Changing the Game - GLCND.IO

Vision-language models (VLMs) efficiently process documents, extracting insights from text and visuals, transforming archives into structured data.

Fonte: glcnd.io

ERNIE for Developers

@ErnieforDevs

1 de set. de

Thrilled to see ERNIE 4.5 VL earn a top spot on the latest SuperCLUE-VLM benchmark. Time to build something new with vision!👀 #AI #Multimodal #VisionLanguage #LLM #Benchmark #ERNIE

ErnieforDevs's tweet image. Thrilled to see ERNIE 4.5 VL earn a top spot on the latest SuperCLUE-VLM benchmark.
Time to build something new with vision!👀
#AI #Multimodal #VisionLanguage #LLM #Benchmark #ERNIE

Solveria 💎

@solveriaart

1 de set. de

Discover FastVLM, a breakthrough in Vision Language Models that boosts image resolution and speeds up processing, making text-rich image understanding more efficient! 🚀📸 #AI #VisionLanguage #TechInnovation rpst.cc/oMa5DZ

solveriaart's tweet image. Discover FastVLM, a breakthrough in Vision Language Models that boosts image resolution and speeds up processing, making text-rich image understanding more efficient! 🚀📸 #AI #VisionLanguage #TechInnovation rpst.cc/oMa5DZ

LocalAI

@LocalAI_API

26 de ago. de

AI Frontiers Daily

@AIFDaily

24 de ago. de

🎙️ AI Frontiers: Vision-Language Integration Breakthroughs (Aug 17, 2025) #AIFrontiers #VisionLanguage #MultimodalAI Watch full episode: youtube.com/watch?v=i08Hly…

AIFDaily's tweet card. AI Frontiers: Vision-Language Integration Breakthroughs (Aug 17, 2025)

youtube.com

YouTube

AI Frontiers: Vision-Language Integration Breakthroughs (Aug 17, 2025)

Fonte: youtube.com

ManuAGI 🤖 - ( ManuIn )

@ManuAGI01

17 de ago. de

Explore now → github.com/rednote-hilab/… #OCR #VisionLanguage #AI #opensource #MachineLearning

ManuAGI01's tweet card. Multilingual Document Layout Parsing in a Single Vision-Language Model - rednote-hilab/dots.ocr

GitHub - rednote-hilab/dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language...

Fonte: github.com

Ankit Dhir

@Ankitdhir

13 de ago. de

Great to see NVIDIA releasing a huge vision-language dataset with 3M samples! Open datasets like this will help accelerate research in OCR, VQA and captioning tasks. Excited to see what developers build with it. #AI #VisionLanguage

NVIDIA AI Developer

@NVIDIAAIDev

12 de ago. de

We just released 3 million samples of high quality vision language model training dataset for use cases such as: 📄 optical character recognition (OCR) 📊 visual question answering (VQA) 📝 captioning 🤗 Learn more: nvda.ws/4oyfevu 📥 Download: nvda.ws/4fz2gtB

Essa Mamdani

@essamamdani

11 de ago. de

MiniGPT-4: Enhancing vision-language understanding using advanced large language models for multi-task learning. #AI #VisionLanguage #LLM #MachineLearning

utRPCdaebx❤ Memecoin🔆π²

@NShondra24558

14 de jul. de

💡 Dynamic Token Compression in DeepSeek-VL: ✂️ 40% fewer tokens for high-res images 📊 +3.1% accuracy on TextVQA Code: github.com/deepseek-ai/vl… #VisionLanguage @PublicAI_ @PublicAIData @DataBabies333

Heitor Rapela

@HeitorRapela

10 de jul. de

🚨 @ICCVConference 2025🚨 Happy to share that our paper: Visual Modality Prompt for Adapting Vision-Language Object Detectors was accepted at #ICCV2025 📄 Paper : arxiv.org/abs/2412.00622 💻 Code: github.com/heitorrapela/M… #ComputerVision #VisionLanguage #ObjectDetection #VLMs

HeitorRapela's tweet image. 🚨 @ICCVConference 2025🚨

Happy to share that our paper: Visual Modality Prompt for Adapting Vision-Language Object Detectors was accepted at #ICCV2025

📄 Paper : arxiv.org/abs/2412.00622
💻 Code: github.com/heitorrapela/M…

#ComputerVision #VisionLanguage #ObjectDetection #VLMs

Vision Language Experts

@visionlanguage0

VisionLanguageCentre

@VisionLanguage

vision language centre

@visionlanguage1

visionlanguage

@visionlanguage2

visionlanguage experts

@VisionlanguageE

ViSiONLANGUAGE

@FLEYEMOUTH357

ViSiONLANGUAGE

@FLEYEMOUTH357

21 de mar. de 2016

TIMING MUST BE for it to last forever #ViSiONLANGUAGE #ObedienceOverSacrifice

ViSiONLANGUAGE

@FLEYEMOUTH357

20 de jan. de 2014

MY FIRST STEP IS TO PLANT MY SEEDS WHEREVER I CAN PLANT THEM #ViSiONLANGUAGE LIVING 4EVA IS DA GOAL #2000NOW

ViSiONLANGUAGE

@FLEYEMOUTH357

19 de jan. de 2015

@StemandThorn Love and Respect your work...... #ViSiONLANGUAGE

Solveria 💎

@solveriaart

1 de set. de

ViSiONLANGUAGE

@FLEYEMOUTH357

22 de mar. de 2016

You see me because eye #StayLit #NeverNeglect #ViSiONLANGUAGE 👤💡

Roopal Garg

@roopalgarg

8 de nov. de

Vittorio Cuculo

@vcuculo

20 de out. de

ViSiONLANGUAGE

@FLEYEMOUTH357

27 de out. de 2014

HELLO WORLD........ #4500FILMS & #ViSiONLANGUAGE #MUUG #MagicalUnionUnderGOD @nuthouseradio L.A.G. COMING SOON!!!!

FLEYEMOUTH357's tweet image. HELLO WORLD........ #4500FILMS &amp; #ViSiONLANGUAGE #MUUG #MagicalUnionUnderGOD @nuthouseradio L.A.G. COMING SOON!!!!

Yu-Cheng Chou

@johnson111788

29 de set. de

Priit @ Amperly AI productivity

@amperlycom

21 de dez. de

In 2023, researchers launched VisIT-Bench, a benchmark with 592 vision-language tasks spanning 70 categories like plot analysis and art knowledge. #AI #VisionLanguage @Stanford

AI at Alibaba International

@AI_AlibabaInt

20 de dez. de

ERNIE for Developers

@ErnieforDevs

1 de set. de

Thrilled to see ERNIE 4.5 VL earn a top spot on the latest SuperCLUE-VLM benchmark. Time to build something new with vision!👀 #AI #Multimodal #VisionLanguage #LLM #Benchmark #ERNIE

Abhinav Girdhar

@AbhinavGirdhar

6 de jun. de 2024

Abhinav Girdhar

@AbhinavGirdhar

19 de jun. de 2024

Abhinav Girdhar

@AbhinavGirdhar

20 de jun. de 2024

1/5 🗣️"MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs." This benchmark evaluates LVLMs in complex dialog scenarios involving multiple images. #AI #MachineLearning #VisionLanguage

AbhinavGirdhar's tweet image. 1/5
🗣️"MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs." This benchmark evaluates LVLMs in complex dialog scenarios involving multiple images. #AI #MachineLearning #VisionLanguage

Pramit Saha

@PramitSaha5

13 de jun. de

🔍 𝗪𝗵𝗮𝘁’𝘀 𝗻𝗲𝘄? We propose a new approach to fine-tuning large Vision-Language Models (VLMs) on resource-constrained clients in Federated Learning—essential for healthcare, where privacy matters most. #FederatedLearning #VisionLanguage #FoundationModels

PramitSaha5's tweet image. 🔍 𝗪𝗵𝗮𝘁’𝘀 𝗻𝗲𝘄?
We propose a new approach to fine-tuning large Vision-Language Models (VLMs) on resource-constrained clients in Federated Learning—essential for healthcare, where privacy matters most.

#FederatedLearning #VisionLanguage #FoundationModels

AI Model Unlocks a New Level of Image-Text Understanding 🔍📷✨ azoai.com/news/20241103/… #AI #MachineLearning #VisionLanguage #ImageText #ProLIP #ZeroShot #UncertaintyModeling #Innovation #DeepLearning #FutureTech @NAVER_AI_Lab

AZoAiNews's tweet image. AI Model Unlocks a New Level of Image-Text Understanding 🔍📷✨ azoai.com/news/20241103/… #AI #MachineLearning #VisionLanguage #ImageText #ProLIP #ZeroShot #UncertaintyModeling #Innovation #DeepLearning #FutureTech @NAVER_AI_Lab

Chunyuan Li

@ChunyuanLi

20 de set. de 2023

AK

@_akhaliq

20 de set. de 2023

Something went wrong.

United States Trends

1. #DWTS 2,691 posts
2. Louisville 81.4K posts
3. Virginia 243K posts
4. Abigail Spanberger 25.1K posts
5. Jets 134K posts
6. MD-11 17.9K posts
7. Flav N/A
8. Jay Jones 28.7K posts
9. Honolulu 8,312 posts
10. #OlandriaxGlamourWOTY 2,498 posts
11. UPS Flight 2976 15.6K posts
12. Jared 27.4K posts
13. #AreYouSure2 48.6K posts
14. Miyares 15.3K posts
15. Azzi 7,361 posts
16. Colts 65.2K posts
17. #いい推しの日 820K posts
18. Fletcher Loyer N/A
19. #ShootingStar N/A
20. Carrie Ann N/A

#visionlanguage resultados da pesquisa

ViSiONLANGUAGE

Vittorio Cuculo

ViSiONLANGUAGE

NITISH JAIN

Ahmed Fessi

E-MM1 Dataset: The World's Largest Multimodal AI Dataset

Priit @ Amperly AI productivity

Global (Glcnd) Command

Transforming Document Processing: How Vision-Language Models are Changing the Game - GLCND.IO

AI at Alibaba International

Chunyuan Li

AK

Roopal Garg

Jennifer Eivaz

LocalAI

powerdrill_ai

Bodunde F Temitope

Essa Mamdani

United States Artificial Intelligence Institute

Abhinav Girdhar

Intelligence & Robotics

Abhinav Girdhar

SUN YOUNG HWANG ᯅ 🇰🇷

Hritik Bansal

Ahmed Fessi

E-MM1 Dataset: The World's Largest Multimodal AI Dataset

Vittorio Cuculo

Changhoon Kim

Sameep Vani

pat

Yu-Cheng Chou

Global (Glcnd) Command

Transforming Document Processing: How Vision-Language Models are Changing the Game - GLCND.IO

ERNIE for Developers

Solveria 💎

LocalAI

AI Frontiers Daily

YouTube

ManuAGI 🤖 - ( ManuIn )

Ankit Dhir

NVIDIA AI Developer

Essa Mamdani

utRPCdaebx❤ Memecoin🔆π²

Heitor Rapela

Vision Language Experts

VisionLanguageCentre

vision language centre

visionlanguage

visionlanguage experts

ViSiONLANGUAGE

ViSiONLANGUAGE

ViSiONLANGUAGE

ViSiONLANGUAGE

Solveria 💎

ViSiONLANGUAGE

Roopal Garg

Vittorio Cuculo

ViSiONLANGUAGE

Yu-Cheng Chou

Priit @ Amperly AI productivity

AI at Alibaba International

ERNIE for Developers

Abhinav Girdhar

Abhinav Girdhar

Abhinav Girdhar

Pramit Saha

Heitor Rapela

powerdrill_ai

AZoAi

Chunyuan Li

AK

United States Trends