EmbeddedLLM

@EmbeddedLLM

Your open-source AI ally. We specialize in integrating LLM into your business.

Science & Technology

於十月 2023 加入

431貼文 1K位跟隨者 1K個跟隨中

置頂

EmbeddedLLM

@EmbeddedLLM

年10月29日

Join us at the vLLM Takeover at Ray Summit 2025. we’ll be on stage Nov 5 in San Francisco. 🚀 #vLLM #RaySummit2025

🔥 Following our big announcement — here’s the full vLLM takeover at Ray Summit 2025! 📍 San Francisco • Nov 3–5 • Hosted by @anyscalecompute Get ready for deep dives into high-performance inference, unified backends, prefix caching, MoE serving, and large-scale…

EmbeddedLLM 已轉發

WEKA

@WekaIO

年12月5日

The turnout at Malaysia @vllm_project Day was incredible! Huge thanks to vLLM, @lmcache, @EmbeddedLLM, @AMD, @RedHat, the Malaysian Government, YTL AI Labs, and our amazing WEKA team for making it possible and driving AI forward across ASEAN.

WekaIO's tweet image. The turnout at Malaysia @vllm_project Day was incredible!

Huge thanks to vLLM, @lmcache, @EmbeddedLLM, @AMD, @RedHat, the Malaysian Government, YTL AI Labs, and our amazing WEKA team for making it possible and driving AI forward across ASEAN.

EmbeddedLLM 已轉發

vLLM

@vllm_project

年12月5日

📢vLLM v0.12.0 is now available. For inference teams running vLLM at the center of their stack, this release refreshes the engine, extends long-context and speculative decoding capabilities, and moves us to a PyTorch 2.9.0 / CUDA 12.9 baseline for future work.

vllm_project's tweet image. 📢vLLM v0.12.0 is now available.

For inference teams running vLLM at the center of their stack, this release refreshes the engine, extends long-context and speculative decoding capabilities, and moves us to a PyTorch 2.9.0 / CUDA 12.9 baseline for future work.

EmbeddedLLM 已轉發

vLLM

@vllm_project

年12月5日

🚀 vLLM now offers an optimized inference recipe for DeepSeek-V3.2. ⚙️ Startup details Run vLLM with DeepSeek-specific components: --tokenizer-mode deepseek_v32 \ --tool-call-parser deepseek_v32 🧰 Usage tips Enable thinking mode in vLLM: –…

$vllm_project's tweet image. 🚀 vLLM now offers an optimized inference recipe for DeepSeek-V3.2. ⚙️ Startup details Run vLLM with DeepSeek-specific components: --tokenizer-mode deepseek_v32 \ --tool-call-parser deepseek_v32 🧰 Usage tips Enable thinking mode in vLLM: –…$

DeepSeek

@deepseek_ai

年12月1日

🚀 Launching DeepSeek-V3.2 & DeepSeek-V3.2-Speciale — Reasoning-first models built for agents! 🔹 DeepSeek-V3.2: Official successor to V3.2-Exp. Now live on App, Web & API. 🔹 DeepSeek-V3.2-Speciale: Pushing the boundaries of reasoning capabilities. API-only for now. 📄 Tech…

deepseek_ai's tweet image. 🚀 Launching DeepSeek-V3.2 &amp; DeepSeek-V3.2-Speciale — Reasoning-first models built for agents!

🔹 DeepSeek-V3.2: Official successor to V3.2-Exp. Now live on App, Web &amp; API.
🔹 DeepSeek-V3.2-Speciale: Pushing the boundaries of reasoning capabilities. API-only for now.

📄 Tech…

EmbeddedLLM 已轉發

vLLM

@vllm_project

年12月3日

🤝 Proud to share the first production-ready vLLM plugin for Gaudi, developed in close collaboration with the Intel team and fully aligned with upstream vLLM. 🔧 This release is validated and ready for deployment, with support for the latest vLLM version coming soon. 📘 The…

vllm_project's tweet image. 🤝 Proud to share the first production-ready vLLM plugin for Gaudi, developed in close collaboration with the Intel team and fully aligned with upstream vLLM.

🔧 This release is validated and ready for deployment, with support for the latest vLLM version coming soon.
📘 The…

EmbeddedLLM 已轉發

vLLM

@vllm_project

年12月3日

We’re taking CUDA debugging to the next level. 🚀 Building on our previous work with CUDA Core Dumps, we are releasing a new guide on tracing hanging and complicated kernels down to the source code. As kernels get more complex (deep inlining, async memory access), standard…

vLLM

@vllm_project

年8月13日

Have you ever felt you are developing cuda kernels and your tests often run into illegal memory access (IMA for short) and you have no idea how to debug? We have collaborated with the @nvidia team to investigate how cuda core dump can help, check out the blogpost to learn more!…

vllm_project's tweet card. TL;DR: If you hit an illegal memory access was encountered error, you can enable CUDA core dump to debug the issue. Simply set the following environment variables and run your program again to...

CUDA Core Dump: An Effective Tool to Debug Memory Access Issues and Beyond

來源: blog.vllm.ai

EmbeddedLLM 已轉發

vLLM

@vllm_project

年12月2日

🎉 Congratulations to the Mistral team on launching the Mistral 3 family! We’re proud to share that @MistralAI, @NVIDIAAIDev, @RedHat_AI, and vLLM worked closely together to deliver full Day-0 support for the entire Mistral 3 lineup. This collaboration enabled: • NVFP4…

Mistral AI

@MistralAI

年12月2日

Introducing the Mistral 3 family of models: Frontier intelligence at all sizes. Apache 2.0. Details in 🧵

EmbeddedLLM 已轉發

Roger Wang

@rogerw0108

年12月1日

vLLM-Omni is an idea I’ve been thinking about for quite a while and we finally brought it to life! We’re really excited to collaborate with community to shape the future of this project, so feedbacks and contributions are very welcomed! Check it out!

vLLM

@vllm_project

年12月1日

More inference workloads now mix autoregressive and diffusion models in a single pipeline to process and generate multiple modalities - text, image, audio, and video. Today we’re releasing vLLM-Omni: an open-source framework that extends vLLM’s easy, fast, and cost-efficient…

vllm_project's tweet card. We are excited to announce the official release of vLLM-Omni, a major extension of the vLLM ecosystem designed to support the next generation of AI: omni-modality models.

Announcing vLLM-Omni: Easy, Fast, and Cheap Omni-Modality Model Serving

來源: blog.vllm.ai

EmbeddedLLM 已轉發

vLLM

@vllm_project

年12月1日

Announcing vLLM-Omni: Easy, Fast, and Cheap Omni-Modality Model Serving

來源: blog.vllm.ai

EmbeddedLLM 已轉發

vLLM

@vllm_project

年11月26日

🇲🇾 Malaysia vLLM Day is 5 days away. vLLM Malaysia Day — 2 Dec 2025 📍 ILHAM Tower, KL We are bringing the vLLM and @lmcache community together with @EmbeddedLLM, @AMD, @RedHat, @WekaIO to advance open, production-grade AI across ASEAN. The Lineup: 🚀 The State of vLLM &…

vllm_project's tweet card. vLLM Malaysia Day 2025 is a regional gathering of AI builders, innovators, and business leaders advancing sovereign, open, production‑grade AI across ASEAN.…

Malaysia vLLM Day · Luma

來源: luma.com

EmbeddedLLM 已轉發

Emad Barsoum

@EmadBarsoumPi

年11月25日

A practical guide to TP, DP, PP and EP on vLLM and @AMD GPU. @AIatAMD rocm.blogs.amd.com/software-tools…

EmbeddedLLM

@EmbeddedLLM

年11月25日

Come work with us on vLLM!

vLLM

@vllm_project

年11月24日

🚀 vLLM Talent Pool is Open! As LLM adoption accelerates, vLLM has become the mainstream inference engine used across major cloud providers (AWS, Google Cloud, Azure, Alibaba Cloud, ByteDance, Tencent, Baidu…) and leading model labs (DeepSeek, Moonshot, Qwen…). To meet the…

vllm_project's tweet card. Join vLLM on Slack. Powered by Community Inviter. You will get an invitation soon. Check your inbox.

Join vLLM on Slack - Community Inviter

來源: communityinviter.com

EmbeddedLLM 已轉發

vLLM

@vllm_project

年11月24日

Join vLLM on Slack - Community Inviter

來源: communityinviter.com

EmbeddedLLM

@EmbeddedLLM

年11月13日

Big night at the vLLM × Meta × AMD meetup in Palo Alto 💥 So fun hanging out IRL with fellow @vllm_project @woosuk_k, @simon_mo_ and the @AMD crew @AnushElangovan and @roaner. Bonus: heading home with a signed @RadeonPRO AI Pro R9700 to squeeze even more tokens/sec out of AMD…

EmbeddedLLM's tweet image. Big night at the vLLM × Meta × AMD meetup in Palo Alto 💥
So fun hanging out IRL with fellow @vllm_project @woosuk_k, @simon_mo_ and the @AMD crew @AnushElangovan and @roaner.

Bonus: heading home with a signed @RadeonPRO AI Pro R9700 to squeeze even more tokens/sec out of AMD…

EmbeddedLLM 已轉發

vLLM

@vllm_project

年11月12日

Thanks to @github for spotlighting vLLM in the Octoverse 2025 report — one of the fastest-growing open-source AI projects this year. 🏆 Top OSS by contributors 🚀 Fastest-growing by contributors 🌱 Attracting the most first-time contributors Trusted by leading open model…

vllm_project's tweet image. Thanks to @github for spotlighting vLLM in the Octoverse 2025 report — one of the fastest-growing open-source AI projects this year.

🏆 Top OSS by contributors
🚀 Fastest-growing by contributors
🌱 Attracting the most first-time contributors

Trusted by leading open model…

EmbeddedLLM 已轉發

vLLM

@vllm_project

年11月12日

สวัสดีครับ Sawadekap, Bangkok! พร้อมจะโกลว์กันหรือยัง? ✨ vLLM Meetup — 21 Nov 2025 Hosted by @EmbeddedLLM, @AMD & @RedHat Members from the vLLM maintainer team will join us to share their latest insights and roadmap — straight from the source! We've also invited local Thai…

vllm_project's tweet card. An evening of AI innovation, insights, and community! Whether you’re building with LLMs, researching cutting-edge methods, or just curious about the future of…

vLLM Bangkok Meet Up · Luma

來源: luma.com

EmbeddedLLM

@EmbeddedLLM

年11月7日

Awesome connecting with the @vllm_project community in person! @rogerw0108 @this_will_echo @hmellor_ @simon_mo_ @Rxday000 @chendi_xue @robertshaw21 @rockwell29139

EmbeddedLLM's tweet image. Awesome connecting with the @vllm_project community in person! @rogerw0108 @this_will_echo @hmellor_ @simon_mo_ @Rxday000 @chendi_xue @robertshaw21 @rockwell29139

Kuntai Du

@this_will_echo

年11月6日

vLLM team @vllm_project , reunited at Ray Summit! (Lowkey: can't wait to meet our magic @KaichaoYou and @darklight1337 in person!) Can't wait to meet our magic @KaichaoYou and @darklight1337 in person!

this_will_echo's tweet image. vLLM team @vllm_project , reunited at Ray Summit! (Lowkey: can't wait to meet our magic @KaichaoYou and @darklight1337 in person!)
Can't wait to meet our magic @KaichaoYou and @darklight1337 in person!

EmbeddedLLM 已轉發

Kuntai Du

@this_will_echo

年11月6日

Happy to meet with @EmbeddedLLM people in person! Thanks for all the hardware supports in @vllm_project and @lmcache !

EmbeddedLLM 已轉發

vLLM

@vllm_project

年11月4日

🔥Highly requested by the community, PaddleOCR-VL is now officially supported on vLLM! 🚀 Check out our recipe for this model to get started!👇docs.vllm.ai/projects/recip…

EmbeddedLLM 已轉發

vLLM

@vllm_project

年11月4日

Amazing work by @RidgerZhu and the ByteDance Seed team — Scaling Latent Reasoning via Looped LMs introduces looped reasoning as a new scaling dimension. 🔥 The Ouro model is now runnable on vLLM (nightly version) — bringing efficient inference to this new paradigm of latent…

Rui-Jie (Ridger) Zhu ✈️ NeurIPS 25

@RidgerZhu

年10月30日

Thrilled to release new paper: “Scaling Latent Reasoning via Looped Language Models.” TLDR: We scale up loop language models to 2.6 billion parameters, and pretrained on > 7 trillion tokens. The resulting model is on par with SOTA language models of 2 to 3x size.

RidgerZhu's tweet image. Thrilled to release new paper: “Scaling Latent Reasoning via Looped Language Models.”

TLDR: We scale up loop language models to 2.6 billion parameters, and pretrained on &gt; 7 trillion tokens. The resulting model is on par with SOTA language models of 2 to 3x size.

EmbeddedLLM 已轉發

vLLM

@vllm_project

年11月4日

Wow Quantization-enhanced Reinforcement Learning using vLLM! Great job by @yukangchen_ 😃

Yukang Chen

@yukangchen_

年10月14日

We open-sourced QeRL — Quantization-enhanced Reinforcement Learning ! 🧠 4-bit quantized RL training 💪 Train a 32B LLM on a single H100 GPU ⚙️ 1.7× faster overall training 🎯 Accuracy on par with bfloat16-level accuracy 🔥 Supports NVFP4 quantization format Moreover, we show…