simon_mo_'s profile picture. @vllm_project

Simon Mo

@simon_mo_

@vllm_project

state of open source in q4 2025,🤖s are first PR great reviewers in vLLM now!

simon_mo_'s tweet image. state of open source in q4 2025,🤖s are first PR great reviewers in vLLM now!

🔥 This is a great read!! Exercise for the reader: how would Blackwell Ultra with 2x exponential cores impact the design of FA4? developer.nvidia.com/blog/inside-nv…

We reverse-engineered Flash Attention 4.

charles_irl's tweet image. We reverse-engineered Flash Attention 4.


Day 1 with @modal notebook and it's so much fun! Switching from CPU to GPU easily between cells while maintaining environments and volumes is 🤌 * Run CPU nodes to download checkpoints and simple dev work with vLLM for testing * Scale out to B200 when ready!

simon_mo_'s tweet image. Day 1 with @modal notebook and it's so much fun! Switching from CPU to GPU easily between cells while maintaining environments and volumes is 🤌

* Run CPU nodes to download checkpoints and simple dev work with vLLM for testing
* Scale out to B200 when ready!

Simon Mo hat repostet

Just enabled full cudagraphs by default on @vllm_project! This change should offer a huge improvement for low latency workloads on small models and efficient MoEs For Qwen3-30B-A3B-FP8 on H100 at bs=10 1024/128, I was able to see a speedup of 47% 🔥

mgoin_'s tweet image. Just enabled full cudagraphs by default on @vllm_project! This change should offer a huge improvement for low latency workloads on small models and efficient MoEs

For Qwen3-30B-A3B-FP8 on H100 at bs=10 1024/128, I was able to see a speedup of 47% 🔥

It has been 1+ month of intense work! Now time to get some sleep 😴

Launching this model together with the amazing @vllm_project team was a real highlight for me! Follow this guide to launch gpt-oss in vLLM: blog.vllm.ai/2025/08/05/gpt…



I didn't expect the first section "KV-cache hit rate is the single most important metric for a production-stage AI agent" but 🤯

After four overhauls and millions of real-world sessions, here are the lessons we learned about context engineering for AI agents: manus.im/blog/Context-E…



Long time in the making and I'm beyond excited about the future of vLLM!

PyTorch and vLLM are both critical to the AI ecosystem and are increasingly being used together for cutting edge generative AI applications, including inference, post-training, and agentic systems at scale. 🔗 Learn more about PyTorch → vLLM integrations and what’s to come:…

PyTorch's tweet image. PyTorch and vLLM are both critical to the AI ecosystem and are increasingly being used together for cutting edge generative AI applications, including inference, post-training, and agentic systems at scale.

🔗 Learn more about PyTorch → vLLM integrations and what’s to come:…


Simon Mo hat repostet

Announcing the first Codex open source fund grant recipients: ⬩vLLM - inference serving engine @vllm_project ⬩OWASP Nettacker - automated network pentesting @iotscan ⬩Pulumi - infrastructure as code in any language @pulumicorp ⬩Dagster - cloud-native data pipelines @dagster


😲 super cool !!! Reminded me of Kevin's thesis "Structured Contexts For Large Language Models" and this is such a natural continuation of the idea.

We're excited to release our latest paper, “Sleep-time Compute: Beyond Inference Scaling at Test-Time”, a collaboration with @sea_snell from UC Berkeley and @Letta_AI advisors / UC Berkeley faculty Ion Stoica and @profjoeyg letta.com/blog/sleep-tim…



Simon Mo hat repostet

🙏 @deepseek_ai's highly performant inference engine is built on top of vLLM. Now they are open-sourcing the engine the right way: instead of a separate repo, they are bringing changes to the open source community so everyone can immediately benefit! github.com/deepseek-ai/op…


Having been at every single vLLM meetup, I won't miss this one :D Looking forward to meet all the vLLM users in Boston!

Friends from the East Coast! Join us on Tuesday, March 11 in Boston for the first ever East Coast vLLM Meetup. You will meet vLLM contributors from @neuralmagic, @RedHat, @Google, and more. Come share how you are using vLLM and see what's on the roadmap! lu.ma/7mu4k4xx



Simon Mo hat repostet

it's Catacter AI now 😼


Simon Mo hat repostet

Landed my first PR in @vllm_project 1 year ago today (github.com/vllm-project/v…) 38K LOC and 100+ PRs later and we are just getting started


Simon Mo hat repostet

Robert and I started contributing to vLLM around the same time and today is my turn. Back then vLLM had only about 30 contributors. One year later, today the project has received contributions from 800+ community members! and we're just getting started github.com/vllm-project/v…


Loading...

Something went wrong.


Something went wrong.