Junrong Lin

@OcssLin

MTS @Alibaba_Qwen on MLsys, building SGLang @lmsysorg | Prev. @DukeU

Science & Technology

California

junronglin.com

Joined April 2022

102Posts 138Followers 322Following

Junrong Lin reposted

Nathan Lambert

@natolambert

Oct 21

Airbnb CEO Brian Chesky: “We’re relying a lot on Alibaba’s Qwen model. It’s very good. It’s also fast and cheap... We use OpenAI’s latest models, but we typically don’t use them that much in production because there are faster and cheaper models.” The valley is built on Qwen?

Junrong Lin reposted

LMSYS Org

@lmsysorg

Oct 14

🚀 SGLang In-Depth Review of the NVIDIA DGX Spark is LIVE! Thanks to @NVIDIA’s early access program, SGLang makes its first ever appearance in a consumer product, the brand-new DGX Spark. The DGX Spark’s 128GB Unified Memory and Blackwell architecture set a new standard for…

Junrong Lin reposted

Biao He

@hebiao064

Sep 11

🧠For Qwen3-Next’s Day 0 support in SGLang, one tricky part was enabling spec decoding with the Hybrid Linear Model—since SSM & conv caches only store the last position (unlike KV cache). 🚀After tons of effort with @qingquan_song, we achieved >2× speedup! Benchmarks below

hebiao064's tweet image. 🧠For Qwen3-Next’s Day 0 support in SGLang, one tricky part was enabling spec decoding with the Hybrid Linear Model—since SSM &amp; conv caches only store the last position (unlike KV cache).

🚀After tons of effort with @qingquan_song, we achieved &gt;2× speedup!

Benchmarks below

Qwen

@Alibaba_Qwen

Sep 11

🚀 Introducing Qwen3-Next-80B-A3B — the FUTURE of efficient LLMs is here! 🔹 80B params, but only 3B activated per token → 10x cheaper training, 10x faster inference than Qwen3-32B.(esp. @ 32K+ context!) 🔹Hybrid Architecture: Gated DeltaNet + Gated Attention → best of speed &…

Alibaba_Qwen's tweet image. 🚀 Introducing Qwen3-Next-80B-A3B — the FUTURE of efficient LLMs is here!

🔹 80B params, but only 3B activated per token → 10x cheaper training, 10x faster inference than Qwen3-32B.(esp. @ 32K+ context!)
🔹Hybrid Architecture: Gated DeltaNet + Gated Attention → best of speed &amp;…

Junrong Lin

@OcssLin

Sep 11

Special thanks to my old friends from the SGLang community especially @hebiao064 @qingquan_song and more (sry I don’t know their X account 🥹) who help support the hybrid model MTP. For linear attention, the eviction during eagle verification phase is different from the regular…

LMSYS Org

@lmsysorg

Sep 11

Qwen3-Next is out! SGLang has supported it on day 0 with speculative decoding. Try it out 👇

Junrong Lin reposted

Qwen

@Alibaba_Qwen

Sep 11

Junrong Lin reposted

Junyang Lin

@JustinLin610

Sep 9

github.com/huggingface/tr…

Junrong Lin reposted

slime

@slime_framework

Sep 6

We’re live! 🎉 This is the official account for slime — an open-source, SGLang-native post-training framework for RL scaling. Kicking things off with our first milestone → v0.1.0 release 🧪 Blog: thudm.github.io/slime/blogs/re… Follow us to run RL faster ⚡️

Junrong Lin

@OcssLin

Sep 6

cool

Elon Musk

@elonmusk

Sep 5

Grok videos can now talk. Major upgrade to image & video generation in a few weeks. This is still early beta.

Junrong Lin

@OcssLin

Sep 5

😎Grad to see my first participated project at Qwen is finally out. More awesome work is coming

Qwen

@Alibaba_Qwen

Sep 5

Big news: Introducing Qwen3-Max-Preview (Instruct) — our biggest model yet, with over 1 trillion parameters! 🚀 Now available via Qwen Chat & Alibaba Cloud API. Benchmarks show it beats our previous best, Qwen3-235B-A22B-2507. Internal tests + early user feedback confirm:…

Alibaba_Qwen's tweet image. Big news: Introducing Qwen3-Max-Preview (Instruct) — our biggest model yet, with over 1 trillion parameters! 🚀

Now available via Qwen Chat &amp; Alibaba Cloud API.

Benchmarks show it beats our previous best, Qwen3-235B-A22B-2507. Internal tests + early user feedback confirm:…

Junrong Lin reposted

SangBin Cho

@Saaaang94

Aug 31

We are using SGLang at really large scale RL, and it’s been working great :)

Casper Hansen

@casper_hansen_

Aug 30

xAI may be one of the single biggest contributors to open-source inference just by serving everything with SGLang

Junrong Lin reposted

Huizi Mao

@huizi_mao

Aug 5

More on QAT: 1. QAT explanation: pytorch.org/blog/quantizat… 2. MXFP4 QAT is supported in NVIDIA ModelOpt: github.com/NVIDIA/TensorR… 3. A quick drawing of how gpt-oss is trained in my understanding:

huizi_mao's tweet image. More on QAT:
1. QAT explanation: pytorch.org/blog/quantizat…
2. MXFP4 QAT is supported in NVIDIA ModelOpt: github.com/NVIDIA/TensorR…
3. A quick drawing of how gpt-oss is trained in my understanding:

Junrong Lin reposted

LMSYS Org

@lmsysorg

Aug 29

🚀 Introducing the first OSS example of fine-tuning gpt-oss with MXFP4 QAT! Powered by NVIDIA ModelOpt + SGLang. Highlights 1. Fine-tune gpt-oss while keeping the original MXFP4 format 2. Preserve FP4 efficiency and recover accuracy 3. Deploy seamlessly with SGLang! Full Blog👇

lmsysorg's tweet image. 🚀 Introducing the first OSS example of fine-tuning gpt-oss with MXFP4 QAT! Powered by NVIDIA ModelOpt + SGLang.

Highlights
1. Fine-tune gpt-oss while keeping the original MXFP4 format
2. Preserve FP4 efficiency and recover accuracy
3. Deploy seamlessly with SGLang!

Full Blog👇

Junrong Lin

@OcssLin

Aug 26

chad

Elon Musk

@elonmusk

Jul 29

This false nomenclature of “researcher” and “engineer”, which is a thinly-masked way of describing a two-tier engineering system, is being deleted from @xAI today. There are only engineers. Researcher is a relic term from academia.

Junrong Lin

@OcssLin

Aug 23

👀

Junyang Lin

@JustinLin610

Aug 23

sth out of ur expectation

Junrong Lin reposted

LMSYS Org

@lmsysorg

Jul 22

✅ We’re excited to support @Alibaba_Qwen’s Qwen3-Coder on SGLang! With tool call parser and expert parallelism enabled, it runs smoothly with flexible configurations. Just give it a try! 🔗 github.com/zhaochenyang20…

lmsysorg's tweet card. My learning notes/codes for ML SYS. Contribute to zhaochenyang20/Awesome-ML-SYS-Tutorial development by creating an account on GitHub.

Awesome-ML-SYS-Tutorial/sglang/qwen/coder.md at main · zhaochenyang20/Awesome-ML-SYS-Tutorial

Source: github.com

Qwen

@Alibaba_Qwen

Jul 22

>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

Alibaba_Qwen's tweet image. &gt;&gt;&gt; Qwen3-Coder is here! ✅

We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

Junrong Lin reposted

Qwen

@Alibaba_Qwen

Jul 21

Bye Qwen3-235B-A22B, hello Qwen3-235B-A22B-2507! After talking with the community and thinking it through, we decided to stop using hybrid thinking mode. Instead, we’ll train Instruct and Thinking models separately so we can get the best quality possible. Today, we’re releasing…

Alibaba_Qwen's tweet image. Bye Qwen3-235B-A22B, hello Qwen3-235B-A22B-2507!

After talking with the community and thinking it through, we decided to stop using hybrid thinking mode. Instead, we’ll train Instruct and Thinking models separately so we can get the best quality possible. Today, we’re releasing…

Junrong Lin reposted

Lianmin Zheng

@lm_zheng

Jul 10

Grok4 🚀 x.com/i/events/19427…

Junrong Lin

@OcssLin

Jul 7

salute

zhyncs

@zhyncs42

Jul 6

🤯A lot of folks have asked me: how does SGLang iterate so fast? PD disaggregation: ~3 weeks. Large-scale EP: ~1 month. GB200 NVL72 support? Also ~3 weeks. And usually with fewer than 5 core devs involved. SGLang moves fast because of talent like this: ~70k commits in a year!🔥

zhyncs42's tweet image. 🤯A lot of folks have asked me: how does SGLang iterate so fast? PD disaggregation: ~3 weeks. Large-scale EP: ~1 month. GB200 NVL72 support? Also ~3 weeks. And usually with fewer than 5 core devs involved. SGLang moves fast because of talent like this: ~70k commits in a year!🔥

Junrong Lin reposted

Qwen

@Alibaba_Qwen

Jun 27

Meet Qwen-VLo, your AI creative engine: • Concept-to-Polish: Turn rough sketches or text prompts into high-res visuals • On-the-Fly Edits: Refine product shots, adjust layouts or styles with simple commands • Global-Ready: Generate image in multiple languages • Progressive…

Junrong Lin reposted

LMSYS Org

@lmsysorg

Jun 26

We're excited to release OME, which is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs). It optimizes the deployment and operation of LLMs by automating model management, intelligent runtime selection, efficient resource…