Yong Wu

@yongwwwml

Joined April 2024

5Posts 10Followers 34Following

Yong Wu reposted

Tianqi Chen

@tqchenml

Oct 21

📢Excited to introduce Apache TVM FFI, an open ABI and FFI for ML systems, enabling compilers, libraries, DSLs, and frameworks to naturally interop with each other. Ship one library across pytorch, jax, cupy etc and runnable across python, c++, rust tvm.apache.org/2025/10/21/tvm…

tqchenml's tweet image. 📢Excited to introduce Apache TVM FFI, an open ABI and FFI for ML systems, enabling compilers, libraries, DSLs, and frameworks to naturally interop with each other. Ship one library across pytorch, jax, cupy etc and runnable across python, c++, rust tvm.apache.org/2025/10/21/tvm…

Yong Wu reposted

Shanli Xing

@shanli_xing

Oct 21

🤔 Can AI optimize the systems it runs on? 🚀 Introducing FlashInfer-Bench, a workflow that makes AI systems self-improving with agents: - Standardized signature for LLM serving kernels - Implement kernels with your preferred language - Benchmark them against real-world serving…

shanli_xing's tweet image. 🤔 Can AI optimize the systems it runs on?

🚀 Introducing FlashInfer-Bench, a workflow that makes AI systems self-improving with agents:

- Standardized signature for LLM serving kernels
- Implement kernels with your preferred language
- Benchmark them against real-world serving…

Yong Wu reposted

LMSYS Org

@lmsysorg

Jun 16

The SGLang team just ran DeepSeek 671B on NVIDIA’s GB200 NVL72, unlocking 7,583 toks/sec/GPU for decoding w/ PD disaggregation + large-scale expert parallelism — 2.7× faster than H100. Don’t miss this work! 🔥 Thanks to Pen Li from NVIDIA who kicked off this collaboration and…

lmsysorg's tweet image. The SGLang team just ran DeepSeek 671B on NVIDIA’s GB200 NVL72, unlocking 7,583 toks/sec/GPU for decoding w/ PD disaggregation + large-scale expert parallelism — 2.7× faster than H100. Don’t miss this work! 🔥

Thanks to Pen Li from NVIDIA who kicked off this collaboration and…

Yong Wu reposted

Zihao Ye

@ye_combinator

May 13

We’re thrilled that FlashInfer won a Best Paper Award at MLSys 2025! 🎉 This wouldn’t have been possible without the community — huge thanks to @lmsysorg’s sglang for deep co-design (which is crtical for inference kernel evolution) and stress-testing over the years, and to…

NVIDIA AI Developer

@NVIDIAAIDev

May 13

🎉 Congratulations to the FlashInfer team – their technical paper, "FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving," just won best paper at #MLSys2025. 🏆 🙌 We are excited to share that we are now backing FlashInfer – a supporter and…