#vllm search results

折腾了几个礼拜,昨晚总算看到起量正常不报错了,起量才知道vllm有很多坑要踩,今天600w TPM走起,距离打满还差99% #vllm

aigclink's tweet image. 折腾了几个礼拜,昨晚总算看到起量正常不报错了,起量才知道vllm有很多坑要踩,今天600w TPM走起,距离打满还差99% #vllm

Docker Model Runner + @vllm_project - run safetensors models and scale to production without leaving your Docker workflow ⚡️ 🔗 Try it out: bit.ly/4psZN7z #Docker #vLLM #ModelRunner #AI #DevTools


(1/n) We are drastically overestimating the cost of LLMs, because we sometimes over-focus for single-query speed. Had the privilege to talk about this topic at the #vllm meetup yesterday. An average human reads 350 words per minute, which translates to 5.5 words per second.

jiayq's tweet image. (1/n) We are drastically overestimating the cost of LLMs, because we sometimes over-focus for single-query speed. Had the privilege to talk about this topic at the #vllm meetup yesterday.

An average human reads 350 words per minute, which translates to 5.5 words per second.

Thank you to everyone who filed issues, reviewed PRs, ran benchmarks, and helped shape this release. vLLM grows because the community does. Easy, fast, and cheap LLM serving for everyone. 🧡 #vLLM #AIInfra #OpenSource

vllm_project's tweet image. Thank you to everyone who filed issues, reviewed PRs, ran benchmarks,
and helped shape this release. vLLM grows because the community does.
Easy, fast, and cheap LLM serving for everyone. 🧡
#vLLM #AIInfra #OpenSource

vLLMがNVIDIA RTX 6000 Pro複数GPU問題解決か?ローカルLLM環境が劇的進化へ!🎉 #vLLM #ローカルLLM

ai_hakase_'s tweet image. vLLMがNVIDIA RTX 6000 Pro複数GPU問題解決か?ローカルLLM環境が劇的進化へ!🎉 #vLLM #ローカルLLM

Working on adding MLX/ MLX-LLM / vLLM to Fabric - you can run local LLM models in nodes alongside metal shaders, geometry, compute, realtime video processing, segmentations and key point analysis and make weird shit. #mlx #llm #vllm cc @awnihannun

_vade's tweet image. Working on adding MLX/ MLX-LLM / vLLM to Fabric - you can run local LLM models in nodes alongside metal shaders, geometry, compute, realtime video processing, segmentations and key point analysis and make weird shit.

#mlx #llm #vllm 

cc @awnihannun
_vade's tweet image. Working on adding MLX/ MLX-LLM / vLLM to Fabric - you can run local LLM models in nodes alongside metal shaders, geometry, compute, realtime video processing, segmentations and key point analysis and make weird shit.

#mlx #llm #vllm 

cc @awnihannun

LLM推論爆速化の秘訣!vLLMの『神髄』公開✨ AIサービスを「速く、安く」使いたい方必見!vLLMの内部構造を徹底解説した記事で、推論の高速化・低コスト化を実現しましょう🚀 #vLLM #AI活用

ai_hakase_'s tweet image. LLM推論爆速化の秘訣!vLLMの『神髄』公開✨
AIサービスを「速く、安く」使いたい方必見!vLLMの内部構造を徹底解説した記事で、推論の高速化・低コスト化を実現しましょう🚀
#vLLM #AI活用

Having seen way too many vLLM forks, this looks like a great way forward - Building Clean, Maintainable #vLLM Modifications Using the Plugin System blog.vllm.ai/2025/11/20/vll… #AI #LLM


LLMを爆速化する秘訣は「vLLM」!🚀 AIモデルの推論を劇的に高速化し、運用コストも削減する「vLLM」がすごいんです!✨ 個人や中小企業のAIサービス展開を強力にサポート。AI活用がもっと快適になりますよ! #vLLM

ai_hakase_'s tweet image. LLMを爆速化する秘訣は「vLLM」!🚀
AIモデルの推論を劇的に高速化し、運用コストも削減する「vLLM」がすごいんです!✨ 個人や中小企業のAIサービス展開を強力にサポート。AI活用がもっと快適になりますよ!
#vLLM

DeepSeek's (@deepseek_ai) latest—MLA, Multi-Token Prediction, 256 Experts, FP8 block quantization—shines with @vllm_project. Catch the office hours session were we discuss all the DeepSeek goodies and explore their integration and benchmarks with #vLLM.


Docker Model Runner now integrates the#/ #vLLM inference engine and safetensors models, unlocking high-throughput #AI inference with the same #Docker tooling you already use. docker.com/blog/docker-mo… #LLM


Getting 10x speedup in #vLLM is easier than you think 📈 I just discovered speculative decoding with ngram lookup and the results speak for themselves. Here's what you add to your vLLM serve command: speculative_config={ "method": "ngram", "num_speculative_tokens": 8,…

joelniklaus's tweet image. Getting 10x speedup in #vLLM is easier than you think 📈

I just discovered speculative decoding with ngram lookup and the results speak for themselves. Here's what you add to your vLLM serve command:
speculative_config={
    "method": "ngram",
    "num_speculative_tokens": 8,…

Communicate with #vLLM using the #OpenAI specification as implemented by the #SwiftOpenAI and MacPaw/OpenAI #opensource projects. 🔗 red.ht/3GfSQWs

rhdevelopers's tweet image. Communicate with #vLLM using the #OpenAI specification as implemented by the #SwiftOpenAI and MacPaw/OpenAI #opensource projects.

🔗 red.ht/3GfSQWs

🚀 llm-d v0.3.1 is LIVE! 🚀 This patch release is packed with key follow-ups from v0.3.0, including new hardware support, expanded cloud provider integration, and streamlined image builds. Dive into the full changelog: github.com/llm-d/llm-d/re… #llmd #OpenSource #vLLM #Release

_llm_d_'s tweet image. 🚀 llm-d v0.3.1 is LIVE! 🚀

This patch release is packed with key follow-ups from v0.3.0, including new hardware support, expanded cloud provider integration, and streamlined image builds.

Dive into the full changelog: github.com/llm-d/llm-d/re…

#llmd #OpenSource #vLLM #Release

🥳AutoRound landed in @vllm_project llm-compressor, supporting INT2 - INT8, MXFP4, NVFP4, FP8 and MXFP8 quantization for LLMs/VLMs on Intel CPUs/GPUs/HPUs and CUDA. Thanks to team & community. Github github.com/intel/auto-rou… and PR: github.com/vllm-project/l… #intel #autoround #vllm


Full house at the #vLLM and @ollama meetup in SF hosted by @ycombinator. Great to see familiar faces and meet new ones!

steren's tweet image. Full house at the #vLLM and @ollama meetup in SF hosted by @ycombinator.
Great to see familiar faces and meet new ones!
steren's tweet image. Full house at the #vLLM and @ollama meetup in SF hosted by @ycombinator.
Great to see familiar faces and meet new ones!

Disaggregated Inference at Scale with #PyTorch & #vLLM: Meta’s vLLM disagg implementation improves inference efficiency in latency & throughput vs its internal stack, with optimizations now being upstreamed to the vLLM community. 🔗 hubs.la/Q03J87tS0

PyTorch's tweet image. Disaggregated Inference at Scale with #PyTorch & #vLLM:

Meta’s vLLM disagg implementation improves inference efficiency in latency & throughput vs its internal stack, with optimizations now being upstreamed to the vLLM community.

🔗 hubs.la/Q03J87tS0

Batch Inference with Qwen2 Vision LLM (Sparrow) I'm explaining several hints how to optimize Qwen2 Visual LLM performance for batch processing. Complete video: youtube.com/watch?v=9SmQxT… Code: github.com/katanaml/sparr… Sparrow UI: katanaml-sparrow-ui.hf.space @katana_ml #vllm #ocr


Docker Model Runner now integrates the#/ #vLLM inference engine and safetensors models, unlocking high-throughput #AI inference with the same #Docker tooling you already use. docker.com/blog/docker-mo… #LLM


折腾了几个礼拜,昨晚总算看到起量正常不报错了,起量才知道vllm有很多坑要踩,今天600w TPM走起,距离打满还差99% #vllm

aigclink's tweet image. 折腾了几个礼拜,昨晚总算看到起量正常不报错了,起量才知道vllm有很多坑要踩,今天600w TPM走起,距离打满还差99% #vllm

Docker Model Runner + @vllm_project - run safetensors models and scale to production without leaving your Docker workflow ⚡️ 🔗 Try it out: bit.ly/4psZN7z #Docker #vLLM #ModelRunner #AI #DevTools


Having seen way too many vLLM forks, this looks like a great way forward - Building Clean, Maintainable #vLLM Modifications Using the Plugin System blog.vllm.ai/2025/11/20/vll… #AI #LLM


Successfully deployed WiNGPT-3.5 30BA3B and MedEvidence on NVIDIA’s new DGX Spark using vLLM. 🚀 Quick tests show rock-solid stability and incredibly fluent output. Impressed with the performance! #AI #NVIDIA #vLLM #LLM #MedicalAI


💡ローカルLLM徹底比較|vLLM vs llama.cpp RTX 4090でLlama-3-8Bを実行した結果: ・vLLM: 120-180 tokens/s ・llama.cpp: 25-30 tokens/s vLLMが4〜6倍高速!用途に応じた選択が重要です。 #ローカルLLM #AIHack #vLLM


GPUサーバーvsローカルLLM…ピーガガ…どっちを選ぶかじゃと?🤔vLLM(Python)とllama.cpp(C++)…ふむ、神託は「財布と相談💰」と言っておるぞ! #LLM #vLLM #llama_cpp tinyurl.com/26bwpo8p


vLLM v0.11.1でLLM爆速化!古いGPUでもOK!🚀 vLLM v0.11.1が登場し、Turing系GPU(RTX 2080など)でもLLMの推論速度が劇的に向上しました!特にPrefillが速くなり、既存のGPUを最大限に活用できる嬉しいアップデートです。ぜひ試してみてくださいね!✨ #vLLM #LLM高速化

ai_hakase_'s tweet image. vLLM v0.11.1でLLM爆速化!古いGPUでもOK!🚀
vLLM v0.11.1が登場し、Turing系GPU(RTX 2080など)でもLLMの推論速度が劇的に向上しました!特にPrefillが速くなり、既存のGPUを最大限に活用できる嬉しいアップデートです。ぜひ試してみてくださいね!✨
#vLLM #LLM高速化

Accelerating the take-up of some of our SOTA research into the Enteprise Search and Reason market. #LightOnOCR is now compatible with #vLLM.

LightOnOCR-1B is now part of @vllm_project v0.11.2! Transform documents into structured Markdown in a single pass: 6.5x faster than dots.ocr 2.7x faster than PaddleOCR Try production-ready end-to-end OCR now! Zero pipeline complexity, provided to you by @LightOnIO and #vLLM:…



Thank you to everyone who filed issues, reviewed PRs, ran benchmarks, and helped shape this release. vLLM grows because the community does. Easy, fast, and cheap LLM serving for everyone. 🧡 #vLLM #AIInfra #OpenSource

vllm_project's tweet image. Thank you to everyone who filed issues, reviewed PRs, ran benchmarks,
and helped shape this release. vLLM grows because the community does.
Easy, fast, and cheap LLM serving for everyone. 🧡
#vLLM #AIInfra #OpenSource

Try building #vllm from source for $META cwm

chswiger's tweet image. Try building #vllm from source for $META cwm

#ComfyUI-Molmo お試し。やってる事は単純で画像→VLLM→FLUX.1 dev + Depthで画像生成。肝は #VLLM#Molmo-7B-D。GPT-4VとGPT-4oの間程度の性能らしい。軽く試したところNSFWもOK。#JoyCaption とどっちが上だろう? #AI美女 #AIグラビア github.com/CY-CHENYUE/Com…

PhotogenicWeekE's tweet image. #ComfyUI-Molmo お試し。やってる事は単純で画像→VLLM→FLUX.1 dev + Depthで画像生成。肝は #VLLM の #Molmo-7B-D。GPT-4VとGPT-4oの間程度の性能らしい。軽く試したところNSFWもOK。#JoyCaption とどっちが上だろう?
#AI美女 #AIグラビア 
github.com/CY-CHENYUE/Com…

My testing for #Vllm inside the #Kubernetes cluster That is my first question to test

PYTHON01100100's tweet image. My testing for #Vllm inside the #Kubernetes cluster
That is my first question to test

LLM推論爆速化の秘訣!vLLMの『神髄』公開✨ AIサービスを「速く、安く」使いたい方必見!vLLMの内部構造を徹底解説した記事で、推論の高速化・低コスト化を実現しましょう🚀 #vLLM #AI活用

ai_hakase_'s tweet image. LLM推論爆速化の秘訣!vLLMの『神髄』公開✨
AIサービスを「速く、安く」使いたい方必見!vLLMの内部構造を徹底解説した記事で、推論の高速化・低コスト化を実現しましょう🚀
#vLLM #AI活用

vLLMがNVIDIA RTX 6000 Pro複数GPU問題解決か?ローカルLLM環境が劇的進化へ!🎉 #vLLM #ローカルLLM

ai_hakase_'s tweet image. vLLMがNVIDIA RTX 6000 Pro複数GPU問題解決か?ローカルLLM環境が劇的進化へ!🎉 #vLLM #ローカルLLM

LLMを爆速化する秘訣は「vLLM」!🚀 AIモデルの推論を劇的に高速化し、運用コストも削減する「vLLM」がすごいんです!✨ 個人や中小企業のAIサービス展開を強力にサポート。AI活用がもっと快適になりますよ! #vLLM

ai_hakase_'s tweet image. LLMを爆速化する秘訣は「vLLM」!🚀
AIモデルの推論を劇的に高速化し、運用コストも削減する「vLLM」がすごいんです!✨ 個人や中小企業のAIサービス展開を強力にサポート。AI活用がもっと快適になりますよ!
#vLLM

折腾了几个礼拜,昨晚总算看到起量正常不报错了,起量才知道vllm有很多坑要踩,今天600w TPM走起,距离打满还差99% #vllm

aigclink's tweet image. 折腾了几个礼拜,昨晚总算看到起量正常不报错了,起量才知道vllm有很多坑要踩,今天600w TPM走起,距离打满还差99% #vllm

(1/n) We are drastically overestimating the cost of LLMs, because we sometimes over-focus for single-query speed. Had the privilege to talk about this topic at the #vllm meetup yesterday. An average human reads 350 words per minute, which translates to 5.5 words per second.

jiayq's tweet image. (1/n) We are drastically overestimating the cost of LLMs, because we sometimes over-focus for single-query speed. Had the privilege to talk about this topic at the #vllm meetup yesterday.

An average human reads 350 words per minute, which translates to 5.5 words per second.

Disaggregated Inference at Scale with #PyTorch & #vLLM: Meta’s vLLM disagg implementation improves inference efficiency in latency & throughput vs its internal stack, with optimizations now being upstreamed to the vLLM community. 🔗 hubs.la/Q03J87tS0

PyTorch's tweet image. Disaggregated Inference at Scale with #PyTorch & #vLLM:

Meta’s vLLM disagg implementation improves inference efficiency in latency & throughput vs its internal stack, with optimizations now being upstreamed to the vLLM community.

🔗 hubs.la/Q03J87tS0

#Grok3#VLLM もいけますね!#Premiumアカウント で1日どの程度使えるか不明ですが、結構いいかも!?(笑)

PhotogenicWeekE's tweet image. #Grok3、#VLLM もいけますね!#Premiumアカウント で1日どの程度使えるか不明ですが、結構いいかも!?(笑)
PhotogenicWeekE's tweet image. #Grok3、#VLLM もいけますね!#Premiumアカウント で1日どの程度使えるか不明ですが、結構いいかも!?(笑)

Getting 10x speedup in #vLLM is easier than you think 📈 I just discovered speculative decoding with ngram lookup and the results speak for themselves. Here's what you add to your vLLM serve command: speculative_config={ "method": "ngram", "num_speculative_tokens": 8,…

joelniklaus's tweet image. Getting 10x speedup in #vLLM is easier than you think 📈

I just discovered speculative decoding with ngram lookup and the results speak for themselves. Here's what you add to your vLLM serve command:
speculative_config={
    "method": "ngram",
    "num_speculative_tokens": 8,…

Don't pay for closed source proprietary solutions that you get for free, with plain old open source. Support those who value honesty and transparency. FP8 #vLLM with @AMD #MI300x

HotAisle's tweet image. Don't pay for closed source proprietary solutions that you get for free, with plain old open source. 

Support those who value honesty and transparency.

FP8 #vLLM with @AMD #MI300x

Something we're doing differently this time around, we added a #vLLM track to #RaySummit! @vllm_project is one of the most popular inference engines, and is often used together with @raydistributed for scaling LLM inference. Can't wait to hear from these companies about how…

robertnishihara's tweet image. Something we're doing differently this time around, we added a #vLLM track to #RaySummit! @vllm_project is one of the most popular inference engines, and is often used together with @raydistributed for scaling LLM inference.

Can't wait to hear from these companies about how…

Communicate with #vLLM using the #OpenAI specification as implemented by the #SwiftOpenAI and MacPaw/OpenAI #opensource projects. 🔗 red.ht/3GfSQWs

rhdevelopers's tweet image. Communicate with #vLLM using the #OpenAI specification as implemented by the #SwiftOpenAI and MacPaw/OpenAI #opensource projects.

🔗 red.ht/3GfSQWs

#Whisk やってみた。これ #VLLM で画像をPromptにして(直接Promptも書ける)、それをモデル、背景、スタイルでPrompt的にMix、#Imagen3 で出力する感じか。縦、横、正方形に対応。

PhotogenicWeekE's tweet image. #Whisk やってみた。これ #VLLM で画像をPromptにして(直接Promptも書ける)、それをモデル、背景、スタイルでPrompt的にMix、#Imagen3 で出力する感じか。縦、横、正方形に対応。

画像生成 AI をうまく使いこなせなかった方に朗報です! 【Google の最新画像生成 AI『Whisk』が登場】しました!!これできっと作りたかったあんな画像もこんな画像も、作れます!!! ここからどうぞ↓ labs.google/whisk ※一部の機能については、英語でのご利用を推奨しております



vLLM v0.11.1でLLM爆速化!古いGPUでもOK!🚀 vLLM v0.11.1が登場し、Turing系GPU(RTX 2080など)でもLLMの推論速度が劇的に向上しました!特にPrefillが速くなり、既存のGPUを最大限に活用できる嬉しいアップデートです。ぜひ試してみてくださいね!✨ #vLLM #LLM高速化

ai_hakase_'s tweet image. vLLM v0.11.1でLLM爆速化!古いGPUでもOK!🚀
vLLM v0.11.1が登場し、Turing系GPU(RTX 2080など)でもLLMの推論速度が劇的に向上しました!特にPrefillが速くなり、既存のGPUを最大限に活用できる嬉しいアップデートです。ぜひ試してみてくださいね!✨
#vLLM #LLM高速化

#Cerebras just pruned 40% of GLM-4.6's 355B parameters, and it still codes like a beast. No custom stack needed: runs in #vLLM banandre.com/blog/2025-10/p…

andre_banandre's tweet image. #Cerebras just pruned 40% of GLM-4.6's 355B parameters, and it still codes like a beast. 

No custom stack needed: runs in #vLLM

banandre.com/blog/2025-10/p…

この #VLLM#gemma-3-27b-it-qat-q4_0-gguf を使ったChat and 画像生成、実写/3枚目 "what?"で出て来た/2枚目 英文に"手前に雰囲気の合った女性を立たせてください"(27bなので日本語OK) 1枚目。なかなか使える♪ < Caption系だと2枚目は出ても1枚目は手で追加 #AI美女 #AIグラビア

PhotogenicWeekE's tweet image. この #VLLM な #gemma-3-27b-it-qat-q4_0-gguf を使ったChat and 画像生成、実写/3枚目 &quot;what?&quot;で出て来た/2枚目 英文に&quot;手前に雰囲気の合った女性を立たせてください&quot;(27bなので日本語OK) 1枚目。なかなか使える♪ &amp;lt;  Caption系だと2枚目は出ても1枚目は手で追加
#AI美女 #AIグラビア
PhotogenicWeekE's tweet image. この #VLLM な #gemma-3-27b-it-qat-q4_0-gguf を使ったChat and 画像生成、実写/3枚目 &quot;what?&quot;で出て来た/2枚目 英文に&quot;手前に雰囲気の合った女性を立たせてください&quot;(27bなので日本語OK) 1枚目。なかなか使える♪ &amp;lt;  Caption系だと2枚目は出ても1枚目は手で追加
#AI美女 #AIグラビア
PhotogenicWeekE's tweet image. この #VLLM な #gemma-3-27b-it-qat-q4_0-gguf を使ったChat and 画像生成、実写/3枚目 &quot;what?&quot;で出て来た/2枚目 英文に&quot;手前に雰囲気の合った女性を立たせてください&quot;(27bなので日本語OK) 1枚目。なかなか使える♪ &amp;lt;  Caption系だと2枚目は出ても1枚目は手で追加
#AI美女 #AIグラビア

#VLLMな #gemma-3-27b-it-qat-q4_0-gguf (at #RTX3090)を使って画像を解析、そのPromptを使い #FLUX.1 [dev] で画像生成。ほぼ一致♪ #LM_Studio #OpenWebUI huggingface.co/google/gemma-3…

PhotogenicWeekE's tweet image. #VLLMな #gemma-3-27b-it-qat-q4_0-gguf (at #RTX3090)を使って画像を解析、そのPromptを使い #FLUX.1 [dev] で画像生成。ほぼ一致♪
#LM_Studio #OpenWebUI 
huggingface.co/google/gemma-3…


Loading...

Something went wrong.


Something went wrong.


United States Trends