#visionlanguagemodels 搜索结果

Vlad Ruso PhD

年9月6日

Hugging Face FineVision: The Ultimate Multimodal Dataset for Vision-Language Model Training #FineVision #VisionLanguageModels #HuggingFace #AIResearch #MultimodalDataset itinai.com/hugging-face-f… Understanding the Impact of FineVision on Vision-Language Models Hugging Face has …

vlruso's tweet image. Hugging Face FineVision: The Ultimate Multimodal Dataset for Vision-Language Model Training #FineVision #VisionLanguageModels #HuggingFace #AIResearch #MultimodalDataset
itinai.com/hugging-face-f…

Understanding the Impact of FineVision on Vision-Language Models

Hugging Face has …

JohnSnowLabs

@JohnSnowLabs

年8月30日

Read here: hubs.li/Q03Fs2V30 #MedicalAI #VisionLanguageModels #HealthcareAI #MedicalImaging #ClinicalDecisionSupport #GenerativeAI

JohnSnowLabs's tweet image. Read here: hubs.li/Q03Fs2V30

#MedicalAI #VisionLanguageModels #HealthcareAI #MedicalImaging #ClinicalDecisionSupport #GenerativeAI

1/ 🗑️ in, 🗑️ out With advances in #VisionLanguageModels, there is growing interest in automated #RadiologyReporting. It's great to see such high research interest, BUT... 🚧 Technique seems intriguing, but the figures raise serious doubts about this paper's merit. 🧵 👇

woojinrad's tweet image. 1/ 🗑️ in, 🗑️ out
With advances in #VisionLanguageModels, there is growing interest in automated #RadiologyReporting. It's great to see such high research interest, BUT...
🚧 Technique seems intriguing, but the figures raise serious doubts about this paper's merit. 🧵 👇

Shenhao Wang

@ShenhaoWang_AI

年10月3日

🚀✨ Exciting Publication from @UrbanAI_Lab The paper “Sparkle: Mastering Basic Spatial Capabilities in Vision Language Models Elicits Generalization to Spatial Reasoning” has been accepted to EMNLP 2025! Link: arxiv.org/pdf/2410.16162 #UrbanAI #VisionLanguageModels

ArcGIS Pro

@ArcGISPro

年8月22日

Say goodbye to manual data labeling and hello to instant insights! Our new #VisionLanguageModels can extract features from aerial images using just simple prompts. Simply upload an image and ask a question such as "What do you see?" Learn more: ow.ly/FsbV50WKnC3

ArcGISPro's tweet image. Say goodbye to manual data labeling and hello to instant insights! Our new #VisionLanguageModels can extract features from aerial images using just simple prompts. Simply upload an image and ask a question such as "What do you see?"
Learn more: ow.ly/FsbV50WKnC3

Debora Nozza

@debora_nozza

2024年10月7日

Last week, I gave an invited talk at the 1st workshop on critical evaluation of generative models and their impact on society at #ECCV2024, focusing on unmasking and tackling bias in #VisionLanguageModels. Thanks to the organizers for the invitation!

debora_nozza's tweet image. Last week, I gave an invited talk at the 1st workshop on critical evaluation of generative models and their impact on society at #ECCV2024, focusing on unmasking and tackling bias in #VisionLanguageModels.

Thanks to the organizers for the invitation!

Hans Willert

@HWillert

年10月2日

Using #VisionLanguageModels to Process Millions of Documents | Towards Data Science towardsdatascience.com/using-vision-l…

HWillert's tweet card. Learn how to effectively apply vision language models to problem solving

Using Vision Language Models to Process Millions of Documents | Towards Data Science

来源: towardsdatascience.com

Abhinav Girdhar

@AbhinavGirdhar

2024年5月6日

1/5 Can feedback improve semantic grounding in large vision-language models? A recent study delves into this question, exploring the potential of feedback in enhancing the alignment between visual and textual representations. #AI #VisionLanguageModels

Abhinav Girdhar

@AbhinavGirdhar

2024年5月7日

1/5 BRAVE: A groundbreaking approach to enhancing vision-language models (VLMs)! By combining features from multiple vision encoders, BRAVE creates a more versatile and robust visual representation. #AI #VisionLanguageModels

AbhinavGirdhar's tweet image. 1/5
BRAVE: A groundbreaking approach to enhancing vision-language models (VLMs)! By combining features from multiple vision encoders, BRAVE creates a more versatile and robust visual representation. #AI #VisionLanguageModels

Vlad Ruso PhD

@vlruso

年9月2日

Apple’s FastVLM: 85x Faster Hybrid Vision Encoder Revolutionizing AI Models #FastVLM #VisionLanguageModels #AIInnovation #MultimodalProcessing #AppleTech itinai.com/apples-fastvlm… Apple has made a significant leap in the field of Vision Language Models (VLMs) with the introducti…

vlruso's tweet image. Apple’s FastVLM: 85x Faster Hybrid Vision Encoder Revolutionizing AI Models #FastVLM #VisionLanguageModels #AIInnovation #MultimodalProcessing #AppleTech
itinai.com/apples-fastvlm…

Apple has made a significant leap in the field of Vision Language Models (VLMs) with the introducti…

Eli Schwartz

@Eli_Schwartz

年3月25日

Did you know most vision-language models (like Claude, OpenAI, Gemini) totally suck at reading analog clocks ⏰? (Except Molmo—it’s actually trained for that) #AI #MachineLearning #VisionLanguageModels #vibecoding

Eli_Schwartz's tweet image. Did you know most vision-language models (like Claude, OpenAI, Gemini) totally suck at reading analog clocks ⏰? (Except Molmo—it’s actually trained for that)
#AI #MachineLearning #VisionLanguageModels #vibecoding

GoatStack.AI

@GoatstackAI

2024年3月8日

Exploring the limitations of Vision-Language Models (VLMs) like GPT-4V in complex visual reasoning tasks. #AI #VisionLanguageModels #DeductiveReasoning

GoatstackAI's tweet image. Exploring the limitations of Vision-Language Models (VLMs) like GPT-4V in complex visual reasoning tasks. #AI #VisionLanguageModels #DeductiveReasoning

Ashshak_off_

@AshshakO

年3月2日

Alhamdulillah! Thrilled to share that our work "O-TPT" has been accepted at #CVPR2025! Big thanks to my supervisor and co-authors for the support! thread(1/n) #MachineLearning #VisionLanguageModels #CVPR2025

AshshakO's tweet image. Alhamdulillah! Thrilled to share that our work "O-TPT" has been accepted at #CVPR2025! Big thanks to my supervisor and co-authors for the support!
thread(1/n)
#MachineLearning #VisionLanguageModels #CVPR2025

M. Akhtar Munir

@akhtarTalks

年2月27日

Thrilled to share that we have two papers accepted at #CVPR2025! 🚀 A big thank you to all the collaborators for their contributions. Stay tuned for more updates! Titles in the thread (1/n) #CVPR #VisionLanguageModels #ModelCalibration #EarthObservation

akhtarTalks's tweet image. Thrilled to share that we have two papers accepted at #CVPR2025! 🚀
A big thank you to all the collaborators for their contributions. Stay tuned for more updates!

Titles in the thread (1/n)

#CVPR #VisionLanguageModels #ModelCalibration #EarthObservation

GoatStack.AI

@GoatstackAI

2024年5月13日

Exploring the capabilities of multimodal LLMs in visual network analysis. #LargeLanguageModels #VisualNetworkAnalysis #VisionLanguageModels

GoatstackAI's tweet image. Exploring the capabilities of multimodal LLMs in visual network analysis. #LargeLanguageModels #VisualNetworkAnalysis #VisionLanguageModels

George Z Lin

@gzlin

2024年4月2日

Mini-Gemini, an innovative #framework from CUHK, optimizes #VisionLanguageModels with a dual-encoder, expanded data, and #LargeLanguageModels. #AI #ComputerVision arxiv.org/abs/2403.18814

gzlin's tweet image. Mini-Gemini, an innovative #framework from CUHK, optimizes #VisionLanguageModels with a dual-encoder, expanded data, and #LargeLanguageModels. #AI #ComputerVision
arxiv.org/abs/2403.18814

GoatStack.AI

@GoatstackAI

2024年3月9日

Investigating vision-language models on Raven's Progressive Matrices showcases gaps in visual deductive reasoning. #VisualReasoning #DeductiveReasoning #VisionLanguageModels

GoatstackAI's tweet image. Investigating vision-language models on Raven's Progressive Matrices showcases gaps in visual deductive reasoning. #VisualReasoning #DeductiveReasoning #VisionLanguageModels

JohnSnowLabs

@JohnSnowLabs

年8月9日

GoatStack.AI

@GoatstackAI

2024年6月19日

Introducing a comprehensive benchmark and large-scale dataset to evaluate and improve LVLMs' abilities in multi-turn and multi-image conversations. #DialogUnderstanding #VisionLanguageModels #MultiImageConversations

GoatstackAI's tweet image. Introducing a comprehensive benchmark and large-scale dataset to evaluate and improve LVLMs' abilities in multi-turn and multi-image conversations. #DialogUnderstanding #VisionLanguageModels #MultiImageConversations

KUNGFU.AI

@kungfuai

2024年5月23日

Unlocking the Power of Vision Language Models: Exploring VLM and Its Applications #VisionLanguageModels #MultimodalModels #ImageQuestionAnswering #HuggingFaceLeaderboard #AIResearch #NLP #MachineLearning #ArtificialIntelligence #DataScience #Technology

HackerNoon | Learn Any Technology

@hackernoon

年10月28日

PerSense-D is a new benchmark dataset for personalized dense image segmentation, advancing AI accuracy in crowded visual environments. - hackernoon.com/new-dataset-pe… #visionlanguagemodels #denseimagesegmentation

hackernoon's tweet card. PerSense-D is a new benchmark dataset for personalized dense image segmentation, advancing AI accuracy in crowded visual environments.

New Dataset PerSense-D Enables Model-Agnostic Dense Object Segmentation | HackerNoon

来源: hackernoon.com

HackerNoon | Learn Any Technology

@hackernoon

年10月28日

Adaptive prompts, density maps, and VLMs are used in PerSense's training-free one-shot segmentation framework for dense picture interpretation. - hackernoon.com/persense-deliv… #visionlanguagemodels #denseimagesegmentation

hackernoon's tweet card. Adaptive prompts, density maps, and VLMs are used in PerSense's training-free one-shot segmentation framework for dense picture interpretation.

PerSense Delivers Expert-Level Instance Recognition Without Any Training | HackerNoon

来源: hackernoon.com

HackerNoon | Learn Any Technology

@hackernoon

年10月28日

PerSense is a model-aware, training-free system for one-shot tailored instance division in dense images based on density and vision-language cues. - hackernoon.com/persense-a-one… #visionlanguagemodels #denseimagesegmentation

hackernoon's tweet card. PerSense is a model-aware, training-free system for one-shot tailored instance division in dense images based on density and vision-language cues.

PerSense: A One-Shot Framework for Personalized Segmentation in Dense Images | HackerNoon

来源: hackernoon.com

Pablo Rivas

@_rivas_ai

年10月22日

2/4 The score is computed in three stages: baseline accuracy, degradation under noise, degradation under crafted attacks, then blended with tunable weights w₁ + w₂ = 1 to reflect specific risk profiles. #VisionLanguageModels

Shweta Bhardwaj@ICCV2025

@sh10bhardwaj

年10月21日

(3/3) 🤝 Open to #Collaboration and #Internship Opportunities on: 🧠 Data-centric AI 🤖 Vision-language Model training and evaluation Shoutout to amazing co-authors @JoLiang17 @zhoutianyi ! #VisionLanguageModels #DCAI #DataCentric #ResponsibleAI #ICCV #AI #ML #ComputerVision

leonliuzx

@leonliuzx

年10月17日

🚀 Exciting news! PaddleOCR-VL has rocketed to #1 on @huggingface Trending in just 16 hours! Dive in: huggingface.co/PaddlePaddle/P… #OCR #AI #VisionLanguageModels

leonliuzx's tweet image. 🚀 Exciting news! PaddleOCR-VL has rocketed to #1 on @huggingface Trending in just 16 hours! Dive in: huggingface.co/PaddlePaddle/P…

#OCR #AI #VisionLanguageModels

Himanshu

@WaghHimanshu

年10月8日

A key challenge for VLMs is "grounding" - correctly linking text to visual elements. The latest research uses techniques like bounding box annotations and negative captioning to teach models to see and understand with greater accuracy. #DeepLearning #AI #VisionLanguageModels

Ahmed Masry @ COLM 2025 🇨🇦

@Ahmed_Masry97

年10月7日

💻 We have open-sourced the code at github.com/ServiceNow/Big… 🙌 This was a collaboration effort between @ServiceNowRSRCH , @Mila_Quebec , and @YorkUniversity. #COLM2025 #AI #VisionLanguageModels #Charts #BigCharts

Ahmed_Masry97's tweet card. Contribute to ServiceNow/BigCharts-R1 development by creating an account on GitHub.

GitHub - ServiceNow/BigCharts-R1

来源: github.com

Shenhao Wang

@ShenhaoWang_AI

年10月3日

abhishekjariwala

@abhijariwalaa

年10月3日

New research reveals a paradigm-shifting approach to data curation in vision-language models, unlocking their intrinsic capabilities for more accurate and efficient AI understanding. A big step forward in bridging visual and textual data! 🤖📊 #AI #VisionLanguageModels

Hans Willert

@HWillert

年10月2日

Using #VisionLanguageModels to Process Millions of Documents | Towards Data Science towardsdatascience.com/using-vision-l…

Using Vision Language Models to Process Millions of Documents | Towards Data Science

来源: towardsdatascience.com

HackerNoon | Learn Any Technology

@hackernoon

年10月1日

This paper summarizes a comprehensive framework for typographic attacks, proving their effectiveness and transferability against Vision-LLMs like LLaVA - hackernoon.com/future-of-ad-s… #visionlanguagemodels #visionllms

hackernoon's tweet card. This paper summarizes a comprehensive framework for typographic attacks, proving their effectiveness and transferability against Vision-LLMs like LLaVA

Future of AD Security: Addressing Limitations and Ethical Concerns in Typographic Attack Research |...

来源: hackernoon.com

未找到 "#visionlanguagemodels" 的结果

Vlad Ruso PhD

@vlruso

年9月6日

Vlad Ruso PhD

@vlruso

年9月2日

JohnSnowLabs

@JohnSnowLabs

年8月30日

Read here: hubs.li/Q03Fs2V30 #MedicalAI #VisionLanguageModels #HealthcareAI #MedicalImaging #ClinicalDecisionSupport #GenerativeAI

ArcGIS Pro

@ArcGISPro

年8月22日

Mozhgan

@Mozhgan_nasr

年10月26日

🚀 Submissions open for VLM4RWD @ NeurIPS 2025! Let’s make VLMs efficient & ready for the real world 🌎💡 🗓️ Deadline: Oct 31 📍 Mexico City 🇲🇽 🔗 openreview.net/group?id=NeurI… #NeurIPS2025 #VLM4RWD #VisionLanguageModels

Mozhgan_nasr's tweet image. 🚀 Submissions open for VLM4RWD @ NeurIPS 2025!
Let’s make VLMs efficient &amp; ready for the real world 🌎💡

🗓️ Deadline: Oct 31
📍 Mexico City 🇲🇽
🔗 openreview.net/group?id=NeurI…

#NeurIPS2025 #VLM4RWD #VisionLanguageModels

Woojin Kim

@woojinrad

2024年7月31日

Abhinav Girdhar

@AbhinavGirdhar

2024年5月6日

Abhinav Girdhar

@AbhinavGirdhar

2024年5月7日

JohnSnowLabs

@JohnSnowLabs

年8月9日

Daily Trending AI/ML Topics

@aitrendings

年10月21日

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale 👥 Haiwen Diao, Mingxuan Li, Silei Wu et al. #VisionLanguageModels #AIResearch #DeepLearning #OpenSource #ComputerVision 🔗 trendtoknow.ai

aitrendings's tweet image. From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

👥 Haiwen Diao, Mingxuan Li, Silei Wu et al.

#VisionLanguageModels #AIResearch #DeepLearning #OpenSource #ComputerVision

🔗 trendtoknow.ai

George Z Lin

@gzlin

2024年4月2日

Mini-Gemini, an innovative #framework from CUHK, optimizes #VisionLanguageModels with a dual-encoder, expanded data, and #LargeLanguageModels. #AI #ComputerVision arxiv.org/abs/2403.18814

GoatStack.AI

@GoatstackAI

2024年3月8日

Exploring the limitations of Vision-Language Models (VLMs) like GPT-4V in complex visual reasoning tasks. #AI #VisionLanguageModels #DeductiveReasoning

GoatStack.AI

@GoatstackAI

2024年5月13日

Exploring the capabilities of multimodal LLMs in visual network analysis. #LargeLanguageModels #VisualNetworkAnalysis #VisionLanguageModels

leonliuzx

@leonliuzx

年10月17日

🚀 Exciting news! PaddleOCR-VL has rocketed to #1 on @huggingface Trending in just 16 hours! Dive in: huggingface.co/PaddlePaddle/P… #OCR #AI #VisionLanguageModels

M. Akhtar Munir

@akhtarTalks

年2月27日

Vlad Ruso PhD

@vlruso

年4月17日

Pixel-SAIL: A Revolutionary Single-Transformer Model for Pixel-Level Vision-Language Tasks #PixelSAIL #VisionLanguageModels #ArtificialIntelligence #MachineLearning #Innovation itinai.com/pixel-sail-a-r…

vlruso's tweet image. Pixel-SAIL: A Revolutionary Single-Transformer Model for Pixel-Level Vision-Language Tasks

#PixelSAIL #VisionLanguageModels #ArtificialIntelligence #MachineLearning #Innovation

itinai.com/pixel-sail-a-r…

IEEE Engineering Medicine and Biology Society

@IEEEembs

年7月23日

📢 Call for Papers — JBHI Special Issue: “Transparent Large #VisionLanguageModels in Healthcare” Seeking research on: ✔️ Explainable VLMs ✔️ Medical image-text alignment ✔️ Fair & interpretable AI 📅 Deadline: Sep 30, 2025 🔗 Info: tinyurl.com/4a7d69t2

IEEEembs's tweet image. 📢 Call for Papers — JBHI Special Issue: “Transparent Large #VisionLanguageModels in Healthcare”

Seeking research on:
✔️ Explainable VLMs
✔️ Medical image-text alignment
✔️ Fair &amp; interpretable AI

📅 Deadline: Sep 30, 2025
🔗 Info: tinyurl.com/4a7d69t2

greyson

@xSignLanguage

2024年10月28日

It may seem confusing now, but it will make sense to everyone in the future. @DeafUmbrella #AIResearch #VisionLanguageModels #MultimodalLLM #AIUnderstanding #VisualLanguageAI #AI

xSignLanguage's tweet image. It may seem confusing now, but it will make sense to everyone in the future. @DeafUmbrella

#AIResearch #VisionLanguageModels #MultimodalLLM #AIUnderstanding #VisualLanguageAI #AI

greyson

@xSignLanguage

2024年7月5日

Sign language is gaining traction to be a source of power.

GoatStack.AI

@GoatstackAI

2024年3月9日

Investigating vision-language models on Raven's Progressive Matrices showcases gaps in visual deductive reasoning. #VisualReasoning #DeductiveReasoning #VisionLanguageModels

Ashshak_off_

@AshshakO

年3月2日

Something went wrong.

United States Trends

1. #AEWDynamite 5,038 posts
2. #Survivor49 N/A
3. Claudio 51.8K posts
4. #iubb N/A
5. Godzilla 26.7K posts
6. Lamar Wilkerson N/A
7. Paige 26.7K posts
8. Rickea N/A
9. #ChicagoMed N/A
10. #Unrivaled N/A
11. Breeze 22.9K posts
12. Unplanned 3,200 posts
13. Orange Cassidy N/A
14. Shabbat 4,067 posts
15. Trench 7,915 posts
16. Jalen Duren N/A
17. Phee N/A
18. Captain Kangaroo N/A
19. $DUOL 2,962 posts
20. Kyle Fletcher N/A