#mllm search results

Open-source excellence! InternVL2.5, including our 78B model, is now available on @huggingface. InternVL2.5-78B is the first open #MLLM to achieve GPT-4o-level performance on MMMU.

openmmlab_lib's tweet image. Open-source excellence! InternVL2.5, including our 78B model, is now available on @huggingface. InternVL2.5-78B is the first open #MLLM to achieve GPT-4o-level performance on MMMU.

🤯 AI research is moving FAST! Just saw "LatticeWorld," a framework using Multimodal LLMs for interactive complex world generation. Imagine AI building entire game worlds on the fly! #AI #MLLM #GameDev


🚨 LMCache now turbocharges multimodal models in vLLM! By caching image-token KV pairs, repeated images now get ~100% cache hit rate — cutting latency from 18s to ~1s. Works out of the box. Check the blog: blog.lmcache.ai/2025-07-03-mul… Try it 👉 github.com/LMCache/LMCache #vLLM #MLLM

lmcache's tweet image. 🚨 LMCache now turbocharges multimodal models in vLLM!

By caching image-token KV pairs, repeated images now get ~100% cache hit rate — cutting latency from 18s to ~1s.

Works out of the box.

Check the blog: blog.lmcache.ai/2025-07-03-mul…
Try it 👉 github.com/LMCache/LMCache

#vLLM #MLLM…

Wanna generate webpages automatically? Meet🧇WAFFLE: multimodal #LLMs for generating HTML from UI images! It uses structure-aware attention & contrastive learning. arxiv.org/abs/2410.18362 #LLM4Code #MLLM #FrontendDev #WebDev @LiangShanchao @NanJiang719 Shangshu Qian

Lin0Tan's tweet image. Wanna generate webpages automatically? Meet🧇WAFFLE:  multimodal #LLMs for generating HTML from UI images! It uses structure-aware attention & contrastive learning. arxiv.org/abs/2410.18362
#LLM4Code #MLLM #FrontendDev #WebDev @LiangShanchao @NanJiang719 Shangshu Qian
Lin0Tan's tweet image. Wanna generate webpages automatically? Meet🧇WAFFLE:  multimodal #LLMs for generating HTML from UI images! It uses structure-aware attention & contrastive learning. arxiv.org/abs/2410.18362
#LLM4Code #MLLM #FrontendDev #WebDev @LiangShanchao @NanJiang719 Shangshu Qian
Lin0Tan's tweet image. Wanna generate webpages automatically? Meet🧇WAFFLE:  multimodal #LLMs for generating HTML from UI images! It uses structure-aware attention & contrastive learning. arxiv.org/abs/2410.18362
#LLM4Code #MLLM #FrontendDev #WebDev @LiangShanchao @NanJiang719 Shangshu Qian

AIロボット暴走の落とし穴?!🚨 危険な視覚トリガー攻撃に要注意です! via: 未来のビジネスにAIを導入するなら必見!MLLM搭載ロボットが特定の視覚トリガーで誤動作するリスクが指摘されました。ビジネスの信頼性と安全性を守りましょう!🛡️ #AIセキュリティ #MLLM

From AK

$MLLM team anonymous, high risk to dump see rugcatcher.finance MultiLLM report here CA: BV7jYpQgdgQhx8eyVLjp2vyJhpFDHic1Q3jAYGTZpump #MLLM


Two of our papers have been accepted to the #ACL2025 main conference! Try our code generation benchmark 🐟RepoCod (lt-asset.github.io/REPOCOD/) and website generation tool 🧇WAFFLE (github.com/lt-asset/Waffle)! #LLM4Code #MLLM #FrontendDev #WebDev #LLM #CodeGeneration #Security

Lin0Tan's tweet image. Two of our papers have been accepted to the #ACL2025 main conference! Try our code generation benchmark 🐟RepoCod (lt-asset.github.io/REPOCOD/) and website generation tool 🧇WAFFLE (github.com/lt-asset/Waffle)!
#LLM4Code #MLLM #FrontendDev #WebDev #LLM #CodeGeneration #Security
Lin0Tan's tweet image. Two of our papers have been accepted to the #ACL2025 main conference! Try our code generation benchmark 🐟RepoCod (lt-asset.github.io/REPOCOD/) and website generation tool 🧇WAFFLE (github.com/lt-asset/Waffle)!
#LLM4Code #MLLM #FrontendDev #WebDev #LLM #CodeGeneration #Security
Lin0Tan's tweet image. Two of our papers have been accepted to the #ACL2025 main conference! Try our code generation benchmark 🐟RepoCod (lt-asset.github.io/REPOCOD/) and website generation tool 🧇WAFFLE (github.com/lt-asset/Waffle)!
#LLM4Code #MLLM #FrontendDev #WebDev #LLM #CodeGeneration #Security
Lin0Tan's tweet image. Two of our papers have been accepted to the #ACL2025 main conference! Try our code generation benchmark 🐟RepoCod (lt-asset.github.io/REPOCOD/) and website generation tool 🧇WAFFLE (github.com/lt-asset/Waffle)!
#LLM4Code #MLLM #FrontendDev #WebDev #LLM #CodeGeneration #Security

🔥🔥Tongyi Mobile-Agent Team presents Semi-online Reinforcement Learning, UI-S1, which simulates online RL on offline trajectories to handle sparse rewards and deployment costs. 📖Paper: huggingface.co/papers/2509.11… 🔗Code: github.com/X-PLUG/MobileA… #LLMs #Multimodal #MLLM @_akhaliq

xuhaiya2483846's tweet image. 🔥🔥Tongyi Mobile-Agent Team presents Semi-online Reinforcement Learning, UI-S1, which simulates online RL on offline trajectories to handle sparse rewards
and deployment costs. 
📖Paper: huggingface.co/papers/2509.11…
🔗Code: github.com/X-PLUG/MobileA…
#LLMs #Multimodal #MLLM
@_akhaliq
xuhaiya2483846's tweet image. 🔥🔥Tongyi Mobile-Agent Team presents Semi-online Reinforcement Learning, UI-S1, which simulates online RL on offline trajectories to handle sparse rewards
and deployment costs. 
📖Paper: huggingface.co/papers/2509.11…
🔗Code: github.com/X-PLUG/MobileA…
#LLMs #Multimodal #MLLM
@_akhaliq
xuhaiya2483846's tweet image. 🔥🔥Tongyi Mobile-Agent Team presents Semi-online Reinforcement Learning, UI-S1, which simulates online RL on offline trajectories to handle sparse rewards
and deployment costs. 
📖Paper: huggingface.co/papers/2509.11…
🔗Code: github.com/X-PLUG/MobileA…
#LLMs #Multimodal #MLLM
@_akhaliq

🔥🔥🔥We introduce O3-like multimodal reasoning model, VLM-R3, which seamlessly weave the relevant sub-image content back into an interleaved chain-of-thought. 📖Paper: huggingface.co/papers/2505.16… 🔗Code: github.com/X-PLUG #LLMs #Multimodal #MLLM @_akhaliq

xuhaiya2483846's tweet image. 🔥🔥🔥We introduce O3-like multimodal reasoning model, VLM-R3, which seamlessly weave the relevant sub-image content back into an interleaved chain-of-thought.
📖Paper: huggingface.co/papers/2505.16…
🔗Code: github.com/X-PLUG
#LLMs #Multimodal #MLLM
@_akhaliq
xuhaiya2483846's tweet image. 🔥🔥🔥We introduce O3-like multimodal reasoning model, VLM-R3, which seamlessly weave the relevant sub-image content back into an interleaved chain-of-thought.
📖Paper: huggingface.co/papers/2505.16…
🔗Code: github.com/X-PLUG
#LLMs #Multimodal #MLLM
@_akhaliq
xuhaiya2483846's tweet image. 🔥🔥🔥We introduce O3-like multimodal reasoning model, VLM-R3, which seamlessly weave the relevant sub-image content back into an interleaved chain-of-thought.
📖Paper: huggingface.co/papers/2505.16…
🔗Code: github.com/X-PLUG
#LLMs #Multimodal #MLLM
@_akhaliq
xuhaiya2483846's tweet image. 🔥🔥🔥We introduce O3-like multimodal reasoning model, VLM-R3, which seamlessly weave the relevant sub-image content back into an interleaved chain-of-thought.
📖Paper: huggingface.co/papers/2505.16…
🔗Code: github.com/X-PLUG
#LLMs #Multimodal #MLLM
@_akhaliq

Apple releases Ferret 7B multimodal large language model (MLLM) #Apple #Ferret #MLLM #LLM #AI #TechAI #LearningAI #GenerativeAI #DeepbrainAI

Tech_AI_Tech's tweet image. Apple releases Ferret 7B multimodal large language model (MLLM)

#Apple #Ferret #MLLM #LLM #AI #TechAI #LearningAI #GenerativeAI #DeepbrainAI

TFAR: A Training-Free Framework for Autonomous Reliable Reasoning in Visual Question Answering Zhuo Zhi, Chen Feng, Adam Daneshmend et al.. Action editor: Nicolas THOME. openreview.net/forum?id=cBAKe… #reasoning #mllm #tools


🔥🔥Mobile-Agent-E Release!! Hierarchical multi-agent framework, and a cross-task long-term memory with self-evolving 💡Tips and 🚀Shortcuts. 🌏Project: x-plug.github.io/MobileAgent/ 📖Paper: huggingface.co/papers/2501.11… 🔗Code: github.com/X-PLUG/MobileA… #LLMs #MLLM #Agent @_akhaliq

xuhaiya2483846's tweet image. 🔥🔥Mobile-Agent-E Release!!
Hierarchical multi-agent framework, and a cross-task long-term memory with self-evolving 💡Tips and 🚀Shortcuts.
🌏Project: x-plug.github.io/MobileAgent/
📖Paper: huggingface.co/papers/2501.11…
🔗Code: github.com/X-PLUG/MobileA…
#LLMs #MLLM #Agent
@_akhaliq
xuhaiya2483846's tweet image. 🔥🔥Mobile-Agent-E Release!!
Hierarchical multi-agent framework, and a cross-task long-term memory with self-evolving 💡Tips and 🚀Shortcuts.
🌏Project: x-plug.github.io/MobileAgent/
📖Paper: huggingface.co/papers/2501.11…
🔗Code: github.com/X-PLUG/MobileA…
#LLMs #MLLM #Agent
@_akhaliq

Introducing Ovis 2.5 - our latest multimodal LLM breakthrough! Featuring enhanced visual perception & reasoning capabilities, consistently outperforming larger models across key benchmarks. #Ovis #MLLM

AI_AlibabaInt's tweet image. Introducing Ovis 2.5 - our latest multimodal LLM breakthrough!
Featuring enhanced visual perception & reasoning capabilities, consistently outperforming larger models across key benchmarks.
#Ovis #MLLM

🦙🦙🦙We present #LAMM, a growing open-source community aimed at helping researchers and developers quickly train and evaluate Multi-modal Large Language Models (#MLLM), and further build multi-modal AI agents. As one of the very first open-source endeavors in the #MLLM field,…

9LdROhjZE56jSh9's tweet image. 🦙🦙🦙We present #LAMM, a growing open-source community aimed at helping researchers and developers quickly train and evaluate Multi-modal Large Language Models (#MLLM), and further build multi-modal AI agents.

As one of the very first open-source endeavors in the #MLLM field,…

LAMBDA: Advancing Autonomous Reasoning with Nuro's MLLM In this video, you'll see LAMBDA, Nuro's Multimodal Large Language Model (MLLM), and its integration into The Nuro Driver's™ onboard autonomy stack. Video source: @nuro #autonomousvehicles #MLLM #technology #engineering


🤔 Which multimodal LLM truly hits the mark on precision pointing & visual reasoning? 🏆 Jump into our #PointBattle and watch top MLLMs go head-to-head—only one can point supreme! 👉 Try it now: de87c27de85699eb1b.gradio.live #AI #MLLM #Robotics


🔥🔥PC-Agent Release!! we propose the PC-Agent framework to handle the complex interactive environment and complex tasks in PC scenarios. 📖Paper: huggingface.co/papers/2502.14… 🔗Code: github.com/X-PLUG/MobileA… #LLMs #Multimodal #MLLM @_akhaliq

xuhaiya2483846's tweet image. 🔥🔥PC-Agent Release!!
we propose the PC-Agent framework to handle the complex interactive environment and complex tasks in PC scenarios.
📖Paper: huggingface.co/papers/2502.14…
🔗Code: github.com/X-PLUG/MobileA…
#LLMs #Multimodal #MLLM
@_akhaliq
xuhaiya2483846's tweet image. 🔥🔥PC-Agent Release!!
we propose the PC-Agent framework to handle the complex interactive environment and complex tasks in PC scenarios.
📖Paper: huggingface.co/papers/2502.14…
🔗Code: github.com/X-PLUG/MobileA…
#LLMs #Multimodal #MLLM
@_akhaliq
xuhaiya2483846's tweet image. 🔥🔥PC-Agent Release!!
we propose the PC-Agent framework to handle the complex interactive environment and complex tasks in PC scenarios.
📖Paper: huggingface.co/papers/2502.14…
🔗Code: github.com/X-PLUG/MobileA…
#LLMs #Multimodal #MLLM
@_akhaliq

Just wrapped up Day 3 at @DeepIndaba! 🌟 Thrilled to share our latest research on low-resource AI: "Closing the Gap in ASR" & "Challenging Multimodal LLMs" 💡 Great convos & feedback! #AfricanNLP #ASR #MLLM #LyngualLabs

LyngualLabs's tweet image. Just wrapped up Day 3 at @DeepIndaba! 🌟 Thrilled to share our latest research on low-resource AI: "Closing the Gap in ASR" & "Challenging Multimodal LLMs" 💡 Great convos & feedback! #AfricanNLP #ASR #MLLM #LyngualLabs
LyngualLabs's tweet image. Just wrapped up Day 3 at @DeepIndaba! 🌟 Thrilled to share our latest research on low-resource AI: "Closing the Gap in ASR" & "Challenging Multimodal LLMs" 💡 Great convos & feedback! #AfricanNLP #ASR #MLLM #LyngualLabs
LyngualLabs's tweet image. Just wrapped up Day 3 at @DeepIndaba! 🌟 Thrilled to share our latest research on low-resource AI: "Closing the Gap in ASR" & "Challenging Multimodal LLMs" 💡 Great convos & feedback! #AfricanNLP #ASR #MLLM #LyngualLabs
LyngualLabs's tweet image. Just wrapped up Day 3 at @DeepIndaba! 🌟 Thrilled to share our latest research on low-resource AI: "Closing the Gap in ASR" & "Challenging Multimodal LLMs" 💡 Great convos & feedback! #AfricanNLP #ASR #MLLM #LyngualLabs

🧵1/10 Excited to share our @siggraph paper "MonetGPT: Solving Puzzles Enhances MLLMs' Image Retouching Skills" 🌟 We explore how to make MLLMs operation-aware by solving visual puzzles and propose a procedural framework for image retouching #SIGGRAPH #MLLM

niladridutt's tweet image. 🧵1/10 Excited to share our @siggraph paper "MonetGPT: Solving Puzzles Enhances MLLMs' Image Retouching Skills" 🌟
We explore how to make MLLMs operation-aware by solving visual puzzles and propose a procedural framework for image retouching
#SIGGRAPH #MLLM

AIロボット暴走の落とし穴?!🚨 危険な視覚トリガー攻撃に要注意です! via: 未来のビジネスにAIを導入するなら必見!MLLM搭載ロボットが特定の視覚トリガーで誤動作するリスクが指摘されました。ビジネスの信頼性と安全性を守りましょう!🛡️ #AIセキュリティ #MLLM

From AK

PixelCraft significantly boosts performance for strong MLLMs (GPT-40, Claude 3.7) on tough benchmarks like ChartXiv, ChartQAPro, & Geometry3K. Paper: arxiv.org/pdf/2509.25185 Code: github.com/microsoft/Pixe… #AI #MLLM #VisualReasoning #ComputerVision #MultiAgent


$MLLM team anonymous, high risk to dump see rugcatcher.finance MultiLLM report here CA: BV7jYpQgdgQhx8eyVLjp2vyJhpFDHic1Q3jAYGTZpump #MLLM


¡Revolución en IA! DeepMMSearch-R1 es el primer #MLLM en superar a #GPT en búsqueda web multimodal y #VQA. Usa auto-corrección y búsqueda con recorte de imágenes para un conocimiento en tiempo real. ¡El futuro de la #BúsquedaWeb ha llegado! #DeepMMSearchR1 youtu.be/7MapdFGHl1o

ELMDLMaquina's tweet card. DeepMMSearch-R1: El MLLM que supera a GPT en búsqueda web multimodal...

youtube.com

YouTube

DeepMMSearch-R1: El MLLM que supera a GPT en búsqueda web multimodal...


😆 We are presenting our work, “Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval” in Hawaii, ICCV’25! Time:10/23(Thu), 10:45 Location: Exhibit Hall I PosterID: 232 Paper: arxiv.org/pdf/2507.23284 #ICCV2025 #Retrieval #MLLM

MLVLab's tweet image. 😆 We are presenting our work,
“Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval” in Hawaii, ICCV’25!
Time:10/23(Thu), 10:45
Location: Exhibit Hall I
PosterID: 232
Paper: arxiv.org/pdf/2507.23284
#ICCV2025 #Retrieval #MLLM

Open-source excellence! InternVL2.5, including our 78B model, is now available on @huggingface. InternVL2.5-78B is the first open #MLLM to achieve GPT-4o-level performance on MMMU.

openmmlab_lib's tweet image. Open-source excellence! InternVL2.5, including our 78B model, is now available on @huggingface. InternVL2.5-78B is the first open #MLLM to achieve GPT-4o-level performance on MMMU.

Open-source excellence! InternVL2.5, including our 78B model, is now available on @huggingface. InternVL2.5-78B is the first open #MLLM to achieve GPT-4o-level performance on MMMU.

openmmlab_lib's tweet image. Open-source excellence! InternVL2.5, including our 78B model, is now available on @huggingface. InternVL2.5-78B is the first open #MLLM to achieve GPT-4o-level performance on MMMU.

🚨 LMCache now turbocharges multimodal models in vLLM! By caching image-token KV pairs, repeated images now get ~100% cache hit rate — cutting latency from 18s to ~1s. Works out of the box. Check the blog: blog.lmcache.ai/2025-07-03-mul… Try it 👉 github.com/LMCache/LMCache #vLLM #MLLM

lmcache's tweet image. 🚨 LMCache now turbocharges multimodal models in vLLM!

By caching image-token KV pairs, repeated images now get ~100% cache hit rate — cutting latency from 18s to ~1s.

Works out of the box.

Check the blog: blog.lmcache.ai/2025-07-03-mul…
Try it 👉 github.com/LMCache/LMCache

#vLLM #MLLM…

Wanna generate webpages automatically? Meet🧇WAFFLE: multimodal #LLMs for generating HTML from UI images! It uses structure-aware attention & contrastive learning. arxiv.org/abs/2410.18362 #LLM4Code #MLLM #FrontendDev #WebDev @LiangShanchao @NanJiang719 Shangshu Qian

Lin0Tan's tweet image. Wanna generate webpages automatically? Meet🧇WAFFLE:  multimodal #LLMs for generating HTML from UI images! It uses structure-aware attention & contrastive learning. arxiv.org/abs/2410.18362
#LLM4Code #MLLM #FrontendDev #WebDev @LiangShanchao @NanJiang719 Shangshu Qian
Lin0Tan's tweet image. Wanna generate webpages automatically? Meet🧇WAFFLE:  multimodal #LLMs for generating HTML from UI images! It uses structure-aware attention & contrastive learning. arxiv.org/abs/2410.18362
#LLM4Code #MLLM #FrontendDev #WebDev @LiangShanchao @NanJiang719 Shangshu Qian
Lin0Tan's tweet image. Wanna generate webpages automatically? Meet🧇WAFFLE:  multimodal #LLMs for generating HTML from UI images! It uses structure-aware attention & contrastive learning. arxiv.org/abs/2410.18362
#LLM4Code #MLLM #FrontendDev #WebDev @LiangShanchao @NanJiang719 Shangshu Qian

Two of our papers have been accepted to the #ACL2025 main conference! Try our code generation benchmark 🐟RepoCod (lt-asset.github.io/REPOCOD/) and website generation tool 🧇WAFFLE (github.com/lt-asset/Waffle)! #LLM4Code #MLLM #FrontendDev #WebDev #LLM #CodeGeneration #Security

Lin0Tan's tweet image. Two of our papers have been accepted to the #ACL2025 main conference! Try our code generation benchmark 🐟RepoCod (lt-asset.github.io/REPOCOD/) and website generation tool 🧇WAFFLE (github.com/lt-asset/Waffle)!
#LLM4Code #MLLM #FrontendDev #WebDev #LLM #CodeGeneration #Security
Lin0Tan's tweet image. Two of our papers have been accepted to the #ACL2025 main conference! Try our code generation benchmark 🐟RepoCod (lt-asset.github.io/REPOCOD/) and website generation tool 🧇WAFFLE (github.com/lt-asset/Waffle)!
#LLM4Code #MLLM #FrontendDev #WebDev #LLM #CodeGeneration #Security
Lin0Tan's tweet image. Two of our papers have been accepted to the #ACL2025 main conference! Try our code generation benchmark 🐟RepoCod (lt-asset.github.io/REPOCOD/) and website generation tool 🧇WAFFLE (github.com/lt-asset/Waffle)!
#LLM4Code #MLLM #FrontendDev #WebDev #LLM #CodeGeneration #Security
Lin0Tan's tweet image. Two of our papers have been accepted to the #ACL2025 main conference! Try our code generation benchmark 🐟RepoCod (lt-asset.github.io/REPOCOD/) and website generation tool 🧇WAFFLE (github.com/lt-asset/Waffle)!
#LLM4Code #MLLM #FrontendDev #WebDev #LLM #CodeGeneration #Security

🔥🔥Tongyi Mobile-Agent Team presents Semi-online Reinforcement Learning, UI-S1, which simulates online RL on offline trajectories to handle sparse rewards and deployment costs. 📖Paper: huggingface.co/papers/2509.11… 🔗Code: github.com/X-PLUG/MobileA… #LLMs #Multimodal #MLLM @_akhaliq

xuhaiya2483846's tweet image. 🔥🔥Tongyi Mobile-Agent Team presents Semi-online Reinforcement Learning, UI-S1, which simulates online RL on offline trajectories to handle sparse rewards
and deployment costs. 
📖Paper: huggingface.co/papers/2509.11…
🔗Code: github.com/X-PLUG/MobileA…
#LLMs #Multimodal #MLLM
@_akhaliq
xuhaiya2483846's tweet image. 🔥🔥Tongyi Mobile-Agent Team presents Semi-online Reinforcement Learning, UI-S1, which simulates online RL on offline trajectories to handle sparse rewards
and deployment costs. 
📖Paper: huggingface.co/papers/2509.11…
🔗Code: github.com/X-PLUG/MobileA…
#LLMs #Multimodal #MLLM
@_akhaliq
xuhaiya2483846's tweet image. 🔥🔥Tongyi Mobile-Agent Team presents Semi-online Reinforcement Learning, UI-S1, which simulates online RL on offline trajectories to handle sparse rewards
and deployment costs. 
📖Paper: huggingface.co/papers/2509.11…
🔗Code: github.com/X-PLUG/MobileA…
#LLMs #Multimodal #MLLM
@_akhaliq

🔥🔥🔥mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models 📖Paper: arxiv.org/abs/2408.04840 🔗Code:github.com/X-PLUG/mPLUG-O… #LLMs #Multimodal #MLLM @_akhaliq

xuhaiya2483846's tweet image. 🔥🔥🔥mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
📖Paper: arxiv.org/abs/2408.04840
🔗Code:github.com/X-PLUG/mPLUG-O…
#LLMs #Multimodal #MLLM
@_akhaliq
xuhaiya2483846's tweet image. 🔥🔥🔥mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
📖Paper: arxiv.org/abs/2408.04840
🔗Code:github.com/X-PLUG/mPLUG-O…
#LLMs #Multimodal #MLLM
@_akhaliq

Apple releases Ferret 7B multimodal large language model (MLLM) #Apple #Ferret #MLLM #LLM #AI #TechAI #LearningAI #GenerativeAI #DeepbrainAI

Tech_AI_Tech's tweet image. Apple releases Ferret 7B multimodal large language model (MLLM)

#Apple #Ferret #MLLM #LLM #AI #TechAI #LearningAI #GenerativeAI #DeepbrainAI

🔥🔥🔥We introduce O3-like multimodal reasoning model, VLM-R3, which seamlessly weave the relevant sub-image content back into an interleaved chain-of-thought. 📖Paper: huggingface.co/papers/2505.16… 🔗Code: github.com/X-PLUG #LLMs #Multimodal #MLLM @_akhaliq

xuhaiya2483846's tweet image. 🔥🔥🔥We introduce O3-like multimodal reasoning model, VLM-R3, which seamlessly weave the relevant sub-image content back into an interleaved chain-of-thought.
📖Paper: huggingface.co/papers/2505.16…
🔗Code: github.com/X-PLUG
#LLMs #Multimodal #MLLM
@_akhaliq
xuhaiya2483846's tweet image. 🔥🔥🔥We introduce O3-like multimodal reasoning model, VLM-R3, which seamlessly weave the relevant sub-image content back into an interleaved chain-of-thought.
📖Paper: huggingface.co/papers/2505.16…
🔗Code: github.com/X-PLUG
#LLMs #Multimodal #MLLM
@_akhaliq
xuhaiya2483846's tweet image. 🔥🔥🔥We introduce O3-like multimodal reasoning model, VLM-R3, which seamlessly weave the relevant sub-image content back into an interleaved chain-of-thought.
📖Paper: huggingface.co/papers/2505.16…
🔗Code: github.com/X-PLUG
#LLMs #Multimodal #MLLM
@_akhaliq
xuhaiya2483846's tweet image. 🔥🔥🔥We introduce O3-like multimodal reasoning model, VLM-R3, which seamlessly weave the relevant sub-image content back into an interleaved chain-of-thought.
📖Paper: huggingface.co/papers/2505.16…
🔗Code: github.com/X-PLUG
#LLMs #Multimodal #MLLM
@_akhaliq

🚀mPLUG's latest work!! 🌏TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning 🔗Code: github.com/X-PLUG/mPLUG-D… 📚Paper: huggingface.co/papers/2404.16… #multimodal #LLM #MLLM @_akhaliq

xuhaiya2483846's tweet image. 🚀mPLUG's latest work!!
🌏TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning
🔗Code: github.com/X-PLUG/mPLUG-D…
📚Paper: huggingface.co/papers/2404.16…
#multimodal #LLM #MLLM
@_akhaliq
xuhaiya2483846's tweet image. 🚀mPLUG's latest work!!
🌏TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning
🔗Code: github.com/X-PLUG/mPLUG-D…
📚Paper: huggingface.co/papers/2404.16…
#multimodal #LLM #MLLM
@_akhaliq
xuhaiya2483846's tweet image. 🚀mPLUG's latest work!!
🌏TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning
🔗Code: github.com/X-PLUG/mPLUG-D…
📚Paper: huggingface.co/papers/2404.16…
#multimodal #LLM #MLLM
@_akhaliq

🔥🔥Mobile-Agent-E Release!! Hierarchical multi-agent framework, and a cross-task long-term memory with self-evolving 💡Tips and 🚀Shortcuts. 🌏Project: x-plug.github.io/MobileAgent/ 📖Paper: huggingface.co/papers/2501.11… 🔗Code: github.com/X-PLUG/MobileA… #LLMs #MLLM #Agent @_akhaliq

xuhaiya2483846's tweet image. 🔥🔥Mobile-Agent-E Release!!
Hierarchical multi-agent framework, and a cross-task long-term memory with self-evolving 💡Tips and 🚀Shortcuts.
🌏Project: x-plug.github.io/MobileAgent/
📖Paper: huggingface.co/papers/2501.11…
🔗Code: github.com/X-PLUG/MobileA…
#LLMs #MLLM #Agent
@_akhaliq
xuhaiya2483846's tweet image. 🔥🔥Mobile-Agent-E Release!!
Hierarchical multi-agent framework, and a cross-task long-term memory with self-evolving 💡Tips and 🚀Shortcuts.
🌏Project: x-plug.github.io/MobileAgent/
📖Paper: huggingface.co/papers/2501.11…
🔗Code: github.com/X-PLUG/MobileA…
#LLMs #MLLM #Agent
@_akhaliq

😆 We are presenting our work, “Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval” in Hawaii, ICCV’25! Time:10/23(Thu), 10:45 Location: Exhibit Hall I PosterID: 232 Paper: arxiv.org/pdf/2507.23284 #ICCV2025 #Retrieval #MLLM

MLVLab's tweet image. 😆 We are presenting our work,
“Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval” in Hawaii, ICCV’25!
Time:10/23(Thu), 10:45
Location: Exhibit Hall I
PosterID: 232
Paper: arxiv.org/pdf/2507.23284
#ICCV2025 #Retrieval #MLLM

just found TinyGPT-V, a parameter-efficient #MLLM built on Phi-2 small language model framework with very promising benchmarks arxiv.org/abs/2312.16862

UjlakiMarci's tweet image. just found TinyGPT-V, a parameter-efficient #MLLM built on Phi-2 small
language model framework with very promising benchmarks

arxiv.org/abs/2312.16862
UjlakiMarci's tweet image. just found TinyGPT-V, a parameter-efficient #MLLM built on Phi-2 small
language model framework with very promising benchmarks

arxiv.org/abs/2312.16862

Introducing Ovis 2.5 - our latest multimodal LLM breakthrough! Featuring enhanced visual perception & reasoning capabilities, consistently outperforming larger models across key benchmarks. #Ovis #MLLM

AI_AlibabaInt's tweet image. Introducing Ovis 2.5 - our latest multimodal LLM breakthrough!
Featuring enhanced visual perception & reasoning capabilities, consistently outperforming larger models across key benchmarks.
#Ovis #MLLM

🦙🦙🦙We present #LAMM, a growing open-source community aimed at helping researchers and developers quickly train and evaluate Multi-modal Large Language Models (#MLLM), and further build multi-modal AI agents. As one of the very first open-source endeavors in the #MLLM field,…

9LdROhjZE56jSh9's tweet image. 🦙🦙🦙We present #LAMM, a growing open-source community aimed at helping researchers and developers quickly train and evaluate Multi-modal Large Language Models (#MLLM), and further build multi-modal AI agents.

As one of the very first open-source endeavors in the #MLLM field,…

🚀 探索多模态学习的新境界!HyperLLaVA通过动态视觉和语言专家调整,显著提升了MLLMs的性能。快来查看这篇前沿研究,一起开启多模态智能的新篇章!🌟 #MLLM #HyperLLaVA #MultimodalAI

linxid19942024's tweet image. 🚀 探索多模态学习的新境界!HyperLLaVA通过动态视觉和语言专家调整,显著提升了MLLMs的性能。快来查看这篇前沿研究,一起开启多模态智能的新篇章!🌟 #MLLM #HyperLLaVA #MultimodalAI
linxid19942024's tweet image. 🚀 探索多模态学习的新境界!HyperLLaVA通过动态视觉和语言专家调整,显著提升了MLLMs的性能。快来查看这篇前沿研究,一起开启多模态智能的新篇章!🌟 #MLLM #HyperLLaVA #MultimodalAI
linxid19942024's tweet image. 🚀 探索多模态学习的新境界!HyperLLaVA通过动态视觉和语言专家调整,显著提升了MLLMs的性能。快来查看这篇前沿研究,一起开启多模态智能的新篇章!🌟 #MLLM #HyperLLaVA #MultimodalAI
linxid19942024's tweet image. 🚀 探索多模态学习的新境界!HyperLLaVA通过动态视觉和语言专家调整,显著提升了MLLMs的性能。快来查看这篇前沿研究,一起开启多模态智能的新篇章!🌟 #MLLM #HyperLLaVA #MultimodalAI

HKU and Tencent propose Plot2Code benchmark evaluating Multimodal LLMs' code generation from scientific plots, highlighting challenges and driving future research in visual coding. #AI #MLLM #DataScience

gzlin's tweet image. HKU and Tencent propose Plot2Code benchmark evaluating  Multimodal LLMs' code generation from scientific plots, highlighting challenges and driving future research in visual coding. #AI #MLLM #DataScience

Just wrapped up Day 3 at @DeepIndaba! 🌟 Thrilled to share our latest research on low-resource AI: "Closing the Gap in ASR" & "Challenging Multimodal LLMs" 💡 Great convos & feedback! #AfricanNLP #ASR #MLLM #LyngualLabs

LyngualLabs's tweet image. Just wrapped up Day 3 at @DeepIndaba! 🌟 Thrilled to share our latest research on low-resource AI: "Closing the Gap in ASR" & "Challenging Multimodal LLMs" 💡 Great convos & feedback! #AfricanNLP #ASR #MLLM #LyngualLabs
LyngualLabs's tweet image. Just wrapped up Day 3 at @DeepIndaba! 🌟 Thrilled to share our latest research on low-resource AI: "Closing the Gap in ASR" & "Challenging Multimodal LLMs" 💡 Great convos & feedback! #AfricanNLP #ASR #MLLM #LyngualLabs
LyngualLabs's tweet image. Just wrapped up Day 3 at @DeepIndaba! 🌟 Thrilled to share our latest research on low-resource AI: "Closing the Gap in ASR" & "Challenging Multimodal LLMs" 💡 Great convos & feedback! #AfricanNLP #ASR #MLLM #LyngualLabs
LyngualLabs's tweet image. Just wrapped up Day 3 at @DeepIndaba! 🌟 Thrilled to share our latest research on low-resource AI: "Closing the Gap in ASR" & "Challenging Multimodal LLMs" 💡 Great convos & feedback! #AfricanNLP #ASR #MLLM #LyngualLabs

Multimodal models that confuse left and right… are now learning 3D space like humans. This new research redefines spatial understanding in AI. A thread on Multi-SpatialMLLM — and how it may change robotics, AR, and more 🚀🧠 #AI #MLLM #ComputerVision #Robotics

MikelEcheve's tweet image. Multimodal models that confuse left and right… are now learning 3D space like humans.

This new research redefines spatial understanding in AI.
A thread on Multi-SpatialMLLM — and how it may change robotics, AR, and more 🚀🧠

#AI #MLLM #ComputerVision #Robotics

भारतजेन (BharatGen) स्रोत: NEXT IAS पत्रिका - नैतिक, समावेशी और क्षेत्र-विशिष्ट AI समाधान तैयार करना। - भारतजेन की शुरुआत विज्ञान और प्रौद्योगिकी राज्य मंत्री (स्वतंत्र प्रभार) द्वारा की गई है। विस्तार से पढ़ें - nextias.com/ca/current-aff…भारत-ai-llm-मॉडल-विकसित #NextIAS #Mllm

NEXTIAS_HINDI's tweet image. भारतजेन (BharatGen) 
स्रोत: NEXT IAS पत्रिका

- नैतिक, समावेशी और क्षेत्र-विशिष्ट AI समाधान तैयार करना।
- भारतजेन की शुरुआत विज्ञान और प्रौद्योगिकी राज्य मंत्री (स्वतंत्र प्रभार) द्वारा की गई है।

विस्तार से पढ़ें - nextias.com/ca/current-aff…भारत-ai-llm-मॉडल-विकसित

#NextIAS #Mllm

Loading...

Something went wrong.


Something went wrong.


United States Trends