Simon Mo

@simon_mo_

@vllm_project

Beigetreten im Juli 2018

127Posts 1KFollower 343Folge ich

Was dir gefallen könnte

@skypilot_org

@woosuk_k

@zhuohan123

@shishirpatil_

@conor_power23

@profjoeyg

@bromil101

@lm_zheng

@haozhangml

@Michaelvll1

@richliaw

@infwinston

@audreyccheng

@sarahwooders

@_parasj

Simon Mo

@simon_mo_

06.10.

state of open source in q4 2025,🤖s are first PR great reviewers in vLLM now!

Simon Mo

@simon_mo_

26.09.

🔥 This is a great read!! Exercise for the reader: how would Blackwell Ultra with 2x exponential cores impact the design of FA4? developer.nvidia.com/blog/inside-nv…

simon_mo_'s tweet card. As the latest member of the NVIDIA Blackwell architecture family, the NVIDIA Blackwell Ultra GPU builds on core innovations to accelerate training and AI reasoning. It fuses silicon innovations with…

Inside NVIDIA Blackwell Ultra: The Chip Powering the AI Factory Era | NVIDIA Technical Blog

Quelle: developer.nvidia.com

Charles 🎉 Frye

@charles_irl

26.09.

We reverse-engineered Flash Attention 4.

Day 1 with @modal notebook and it's so much fun! Switching from CPU to GPU easily between cells while maintaining environments and volumes is 🤌 * Run CPU nodes to download checkpoints and simple dev work with vLLM for testing * Scale out to B200 when ready!

simon_mo_'s tweet image. Day 1 with @modal notebook and it's so much fun! Switching from CPU to GPU easily between cells while maintaining environments and volumes is 🤌

* Run CPU nodes to download checkpoints and simple dev work with vLLM for testing
* Scale out to B200 when ready!

Simon Mo hat repostet

Michael Goin

@mgoin_

23.09.

Just enabled full cudagraphs by default on @vllm_project! This change should offer a huge improvement for low latency workloads on small models and efficient MoEs For Qwen3-30B-A3B-FP8 on H100 at bs=10 1024/128, I was able to see a speedup of 47% 🔥

mgoin_'s tweet image. Just enabled full cudagraphs by default on @vllm_project! This change should offer a huge improvement for low latency workloads on small models and efficient MoEs

For Qwen3-30B-A3B-FP8 on H100 at bs=10 1024/128, I was able to see a speedup of 47% 🔥

Simon Mo

@simon_mo_

05.08.

It has been 1+ month of intense work! Now time to get some sleep 😴

Zhuohan Li

@zhuohan123

05.08.

Launching this model together with the amazing @vllm_project team was a real highlight for me! Follow this guide to launch gpt-oss in vLLM: blog.vllm.ai/2025/08/05/gpt…

vLLM Now Supports gpt-oss

Quelle: blog.vllm.ai

Simon Mo

@simon_mo_

19.07.

I didn't expect the first section "KV-cache hit rate is the single most important metric for a production-stage AI agent" but 🤯

Yichao 'Peak' Ji

@peakji

18.07.

After four overhauls and millions of real-world sessions, here are the lessons we learned about context engineering for AI agents: manus.im/blog/Context-E…

peakji's tweet card. This post shares the local optima Manus arrived at through our own "SGD". If you're building your own AI agent, we hope these principles help you converge faster.

Context Engineering for AI Agents: Lessons from Building Manus

Quelle: manus.im

Simon Mo

@simon_mo_

01.07.

Long time in the making and I'm beyond excited about the future of vLLM!

PyTorch

@PyTorch

25.06.

PyTorch and vLLM are both critical to the AI ecosystem and are increasingly being used together for cutting edge generative AI applications, including inference, post-training, and agentic systems at scale. 🔗 Learn more about PyTorch → vLLM integrations and what’s to come:…

PyTorch's tweet image. PyTorch and vLLM are both critical to the AI ecosystem and are increasingly being used together for cutting edge generative AI applications, including inference, post-training, and agentic systems at scale.

🔗 Learn more about PyTorch → vLLM integrations and what’s to come:…

Simon Mo hat repostet

OpenAI Developers

@OpenAIDevs

24.04.

Announcing the first Codex open source fund grant recipients: ⬩vLLM - inference serving engine @vllm_project ⬩OWASP Nettacker - automated network pentesting @iotscan ⬩Pulumi - infrastructure as code in any language @pulumicorp ⬩Dagster - cloud-native data pipelines @dagster…

Simon Mo

@simon_mo_

22.04.

😲 super cool !!! Reminded me of Kevin's thesis "Structured Contexts For Large Language Models" and this is such a natural continuation of the idea.

Letta

@Letta_AI

21.04.

We're excited to release our latest paper, “Sleep-time Compute: Beyond Inference Scaling at Test-Time”, a collaboration with @sea_snell from UC Berkeley and @Letta_AI advisors / UC Berkeley faculty Ion Stoica and @profjoeyg letta.com/blog/sleep-tim…

Letta_AI's tweet card. Sleep-time compute is a new way to scale AI capabilities: letting models "think" during downtime. Instead of sitting idle between tasks, AI agents can now use their "sleep" time to process informat...

Sleep-time Compute | Letta

Quelle: letta.com

Simon Mo hat repostet

vLLM

@vllm_project

14.04.

🙏 @deepseek_ai's highly performant inference engine is built on top of vLLM. Now they are open-sourcing the engine the right way: instead of a separate repo, they are bringing changes to the open source community so everyone can immediately benefit! github.com/deepseek-ai/op…

vllm_project's tweet card. Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation - deepseek-ai/open-infra-index

open-infra-index/OpenSourcing_DeepSeek_Inference_Engine/README.md at main · deepseek-ai/open-infr...

Quelle: github.com

Simon Mo

@simon_mo_

27.02.

Having been at every single vLLM meetup, I won't miss this one :D Looking forward to meet all the vLLM users in Boston!

vLLM

@vllm_project

20.02.

Friends from the East Coast! Join us on Tuesday, March 11 in Boston for the first ever East Coast vLLM Meetup. You will meet vLLM contributors from @neuralmagic, @RedHat, @Google, and more. Come share how you are using vLLM and see what's on the roadmap! lu.ma/7mu4k4xx

vllm_project's tweet card. Join us for the first-ever East Coast vLLM meetup! We're excited to invite you to the inaugural East Coast vLLM meetup, hosted by Neural Magic and Red Hat, on…

East Coast vLLM Meetup · Luma

Quelle: luma.com

Simon Mo hat repostet

Character.AI

@character_ai

13.02.

it's Catacter AI now 😼

Simon Mo hat repostet

Robert Shaw

@robertshaw21

01.02.

Landed my first PR in @vllm_project 1 year ago today (github.com/vllm-project/v…) 38K LOC and 100+ PRs later and we are just getting started

robertshaw21's tweet card. Summary This PR does three things: A) Addresses open feature request (#1870) by refactoring and extending initial implementation of metrics (#1890) to: Handle more complex metrics such as request ...

Refactor Prometheus and Add Request Level Metrics by robertgshaw2-redhat · Pull Request #2316 ·...

Quelle: github.com

Simon Mo hat repostet

Roger Wang

@rogerw0108

13.02.

Robert and I started contributing to vLLM around the same time and today is my turn. Back then vLLM had only about 30 contributors. One year later, today the project has received contributions from 800+ community members! and we're just getting started github.com/vllm-project/v…