Lianmin Zheng

@lm_zheng

Member of technical staff @xAI | Prev: Ph.D. @UCBerkeley, Co-founder @lmsysorg

Bay Area, California

lmzheng.net

Присоединился в Январь 2018

491Посты 14Kчитателей 637в читаемых

Вам может понравиться

@zhuohan123

@tri_dao

@skypilot_org

@haozhangml

@profjoeyg

@woosuk_k

@shishirpatil_

@ying11231

@Michaelvll1

@tqchenml

@DachengLi177

@infwinston

@hanrui_w

@lmxyy1999

@BeidiChen

Lianmin Zheng сделал(а) репост

Hao AI Lab

@haoailab

19 нояб.г.

(1/N) 🚀 We converted a high quality Wan2.2-MoE into an autoregressive model. Preview checkpoint: huggingface.co/FastVideo/Caus… - First autoregressive version of Wan2.2-A14B MoE - I2V compatible - 8-step distilled - Potential backbone for streaming generation and world modeling Try…

FastVideo/CausalWan2.2-I2V-A14B-Preview-Diffusers · Hugging Face

Источник: huggingface.co

Lianmin Zheng сделал(а) репост

LMSYS Org

@lmsysorg

19 нояб.г.

🚀 Introducing Miles — an enterprise-facing RL framework for large-scale MoE training & production, forked from slime. Slime is a lightweight, customizable RL framework that already powers real post-training pipelines and large MoE runs. Miles builds on slime but focuses on new…

lmsysorg's tweet image. 🚀 Introducing Miles — an enterprise-facing RL framework for large-scale MoE training &amp; production, forked from slime.

Slime is a lightweight, customizable RL framework that already powers real post-training pipelines and large MoE runs. Miles builds on slime but focuses on new…

Lianmin Zheng сделал(а) репост

Chayenne Zhao

@GenAI_is_real

19 нояб.г.

We have implemented unified FP8 RL. FP8 for both training and rollout effectively eliminates train–inference inconsistency caused by quantization error, improving both the speed and stability of RL training.

GenAI_is_real's tweet image. We have implemented unified FP8 RL. FP8 for both training and rollout effectively eliminates train–inference inconsistency caused by quantization error, improving both the speed and stability of RL training.

Lianmin Zheng сделал(а) репост

Chayenne Zhao

@GenAI_is_real

17 нояб.г.

We conducted additional experiments to compare speculative training with frozen MTP layers and obtained solid results. Further experiments are being conducted on larger-scale (300B+) MOEs, looking forward!

GenAI_is_real's tweet image. We conducted additional experiments to compare speculative training with frozen MTP layers and obtained solid results.

Further experiments are being conducted on larger-scale (300B+) MOEs, looking forward!

Chayenne Zhao

@GenAI_is_real

14 нояб.г.

We introduce speculative decoding into the RL sampling process, achieving a significant improvement in sampling speed under appropriate batch sizes. Compared to freezing the draft model, the accepted length maintain at a high level, generating long-term stable positive gains.

GenAI_is_real's tweet image. We introduce speculative decoding into the RL sampling process, achieving a significant improvement in sampling speed under appropriate batch sizes. Compared to freezing the draft model, the accepted length maintain at a high level, generating long-term stable positive gains.

Lianmin Zheng сделал(а) репост

Elon Musk

@elonmusk

17 нояб.г.

Grok 4.1 just released. You should notice a significant increase in speed and quality.

xAI

@xai

17 нояб.г.

Introducing Grok 4.1, a frontier model that sets a new standard for conversational intelligence, emotional understanding, and real-world helpfulness. Grok 4.1 is available for free on grok.com, grok.x.com and our mobile apps. x.ai/news/grok-4-1

xai's tweet card. Grok 4.1 is now available to all users on grok.com, 𝕏, and the iOS and Android apps. It is rolling out immediately in Auto mode and can be selected explicitly as “Grok 4.1” in the model picker.

Grok 4.1 | xAI

Источник: x.ai

Lianmin Zheng

@lm_zheng

17 нояб.г.

Grok 4.1 is out! Amazing improvements.

xAI

@xai

17 нояб.г.

Grok 4.1 | xAI

Источник: x.ai

Lianmin Zheng

@lm_zheng

14 нояб.г.

Numeric debugging is extremely challenging given the huge number of kernels in today’s training and inference systems. You can’t get a single kernel wrong. This is a great engineering achievement and will hopefully make RL training and debugging much easier.

LMSYS Org

@lmsysorg

14 нояб.г.

💥 We've achieved perfect training-inference alignment for SGLang & FSDP in slime! (Flash Attn 3, DeepGEMM, etc.) The result? A strict KL divergence of 0. But here's the twist: We spent a month trying to find a baseline that crashes from mismatch... and couldn't. 🤷‍♂️ We haven't…

lmsysorg's tweet image. 💥 We've achieved perfect training-inference alignment for SGLang &amp; FSDP in slime! (Flash Attn 3, DeepGEMM, etc.)

The result? A strict KL divergence of 0.

But here's the twist: We spent a month trying to find a baseline that crashes from mismatch... and couldn't. 🤷‍♂️ We haven't…

Lianmin Zheng сделал(а) репост

LMSYS Org

@lmsysorg

14 нояб.г.

Lianmin Zheng сделал(а) репост

LMSYS Org

@lmsysorg

10 нояб.г.

Proud to power Kimi K2 Thinking with SGLang on Atlas Cloud!

Atlas Cloud

@atlas_cloud_ai

8 нояб.г.

🚀 Kimi K2 Thinking is live on Atlas Cloud — deployed with @lmsysorg for production‑grade, low‑latency inference. Big thanks @Kimi_Moonshot. Try it → atlascloud.ai/models/moonsho… #AtlasCloud #KimiK2 #SGLang

atlascloud.ai

Kimi-K2-Thinking Text Model API by MOONSHOTAI | Atlas Cloud - 200+ AI Models Platform

Deploy Kimi-K2-Thinking instantly with Atlas Cloud's unified API platform. Get competitive pricing, serverless endpoints, and 99.99% uptime.

Источник: atlascloud.ai

Lianmin Zheng

@lm_zheng

10 нояб.г.

Great progress!

Casper Hansen

@casper_hansen_

10 нояб.г.

insane blackwell progress in v0.5.5 by the sglang team. with new optimizations, it's stable like hopper and the performance is great even for multimodal models 181 tokens/s on Qwen3-VL-30B-A3B-Thinking on 1x B200:

casper_hansen_'s tweet image. insane blackwell progress in v0.5.5 by the sglang team. with new optimizations, it's stable like hopper and the performance is great even for multimodal models

181 tokens/s on Qwen3-VL-30B-A3B-Thinking on 1x B200:

Lianmin Zheng

@lm_zheng

7 нояб.г.

Hao has been pioneering efficient architecture research for many years. Always eager to see the innovations from him and his group!

Hao Zhang

@haozhangml

7 нояб.г.

Excited to partner with SGLang: FastVideo + SGLang = the future open ecosystem for diffusion. 🥳🫡 ----------- A few extra cents: Since I started faculty at UCSD, our lab has been investing diffusion for video and text , and in both algorithms and systems. - Text-side, we…

haozhangml's tweet card. TL;DR: LLMs have been traditionally regarded as sequential decoders, decoding one token after another. In this blog, we show pretrained LLMs can be easily taught to operate as efficient parallel...

Consistency Large Language Models: A Family of Efficient Parallel Decoders

Источник: hao-ai-lab.github.io

Lianmin Zheng

@lm_zheng

7 нояб.г.

Great progress!

slime

@slime_framework

7 нояб.г.

Ant AQ-Team @AQ_MedAI @TheInclusionAI and SGLang RL Team @sgl_project just helped land Kimi-K2-Instruct RL on slime — fully wired up and running on 256× H20 141GB 🚀 Huge shout-out to @yngao016, @menlzy, @Yonah_x from AQ Team and @Ji_Li_233, @Yefei_RL from the SGLang RL Team for…

slime_framework's tweet card. As the title says, I run it for 40 steps, and the raw_rewards is shown in the following figure: This work is done in collaboration with @GeLee-Q @yefei12 and @yzlnew

Add kimi-k2-instruct running script by Gao016 · Pull Request #694 · THUDM/slime

Источник: github.com

Lianmin Zheng сделал(а) репост

LMSYS Org

@lmsysorg

7 нояб.г.

🚀 Introducing SGLang Diffusion — bringing SGLang’s high-performance serving to diffusion models. ⚡️ Up to 5.9× faster inference 🧩 Supports major open-source models: Wan, Hunyuan, Qwen-Image, Qwen-Image-Edit, Flux 🧰 Easy to use via OpenAI-compatible API, CLI & Python API…

lmsysorg's tweet image. 🚀 Introducing SGLang Diffusion — bringing SGLang’s high-performance serving to diffusion models.

⚡️ Up to 5.9× faster inference
🧩 Supports major open-source models: Wan, Hunyuan, Qwen-Image, Qwen-Image-Edit, Flux
🧰 Easy to use via OpenAI-compatible API, CLI &amp; Python API…

Lianmin Zheng

@lm_zheng

7 нояб.г.

Future models will be multi modal in multi modal out, potentially combining auto regressive and diffusion architectures. SGLang project takes the first step towards building a unified inference stack for all.

LMSYS Org

@lmsysorg

7 нояб.г.

Lianmin Zheng сделал(а) репост

LMSYS Org

@lmsysorg

6 нояб.г.

Day-0 support for Kimi K2 Thinking on SGLang ⚡ The new open-source thinking-agent model pushes reasoning, coding, and multi-step tool use to new heights. Proud to collaborate with @Kimi_Moonshot to make it run seamlessly: python -m sglang.launch_server \ --model-path…

$lmsysorg's tweet image. Day-0 support for Kimi K2 Thinking on SGLang ⚡ The new open-source thinking-agent model pushes reasoning, coding, and multi-step tool use to new heights. Proud to collaborate with @Kimi_Moonshot to make it run seamlessly: python -m sglang.launch_server \ --model-path…$

Kimi.ai

@Kimi_Moonshot

6 нояб.г.

🚀 Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. 🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%) 🔹 Executes up to 200 – 300 sequential tool calls without human interference 🔹 Excels in reasoning, agentic search, and coding 🔹 256K context window Built…

Kimi_Moonshot's tweet image. 🚀 Hello, Kimi K2 Thinking!
The Open-Source Thinking Agent Model is here.

🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%)
🔹 Executes up to 200 – 300 sequential tool calls without human interference
🔹 Excels in reasoning, agentic search, and coding
🔹 256K context window

Built…

Lianmin Zheng сделал(а) репост

Yunlong Liu

@LiuYunlong63318

30 окт.г.

Awesome step to support native JAX inference from sglang! Now we don't need any torch layer to land code directly on TPU. @SingularMattrix

Lianmin Zheng

@lm_zheng

30 окт.г.

SGLang now has a pure Jax backend, and it runs natively on TPU!

Lianmin Zheng

@lm_zheng

30 окт.г.

SGLang now has a pure Jax backend, and it runs natively on TPU!

LMSYS Org

@lmsysorg

30 окт.г.

SGLang now runs natively on TPU with a new pure Jax backend! SGLang-Jax leverages SGLang's high-performance server architecture and uses Jax to compile the model's forward pass. By combining SGLang and Jax, it delivers fast, native TPU inference while maintaining support for…

lmsysorg's tweet image. SGLang now runs natively on TPU with a new pure Jax backend!

SGLang-Jax leverages SGLang's high-performance server architecture and uses Jax to compile the model's forward pass. By combining SGLang and Jax, it delivers fast, native TPU inference while maintaining support for…

Lianmin Zheng сделал(а) репост

LMSYS Org

@lmsysorg

28 окт.г.

Exciting to see Glyph open-sourced -- exploring a direction similar to DeepSeek-OCR, using visual-text compression to scale context windows! Instead of reading text token by token, Glyph lets models see the text, achieving 3–4× compression while preserving strong performance,…

Z.ai

@Zai_org

27 окт.г.

Glyph: Scaling Context Windows via Visual-Text Compression Paper: arxiv.org/abs/2510.17800 Weights: huggingface.co/zai-org/Glyph Repo: github.com/thu-coai/Glyph Glyph is a framework for scaling the context length through visual-text compression. It renders long textual sequences into…

Zai_org's tweet image. Glyph: Scaling Context Windows via Visual-Text Compression

Paper: arxiv.org/abs/2510.17800
Weights: huggingface.co/zai-org/Glyph
Repo: github.com/thu-coai/Glyph

Glyph is a framework for scaling the context length through visual-text compression. It renders long textual sequences into…

Lianmin Zheng сделал(а) репост

Jon Durbin

@jon_durbin

28 окт.г.

FYI, our SGLang fork is public here: github.com/chutesai/sglang (branch chutes) to boost your SGLang perf from 73% to 97% 👀 When I tested a few manually there were a few discrepancies where instead of generating a string it generated an array of strings, etc.; curious if switching…

jon_durbin's tweet card. SGLang is a fast serving framework for large language models and vision language models. - chutesai/sglang

GitHub - chutesai/sglang: SGLang is a fast serving framework for large language models and vision...

Источник: github.com

Kimi.ai

@Kimi_Moonshot

28 окт.г.

Kimi K2vv updated! We've added case-by-case statistics for ToolCall-Trigger Similarity and ToolCall-Schema Accuracy. Feedback is welcome! github.com/MoonshotAI/K2-…

Kimi_Moonshot's tweet image. Kimi K2vv updated! We've added case-by-case statistics for ToolCall-Trigger Similarity and ToolCall-Schema Accuracy.
Feedback is welcome!
github.com/MoonshotAI/K2-…

Lianmin Zheng сделал(а) репост

LMSYS Org

@lmsysorg

27 окт.г.

This huge contribution from @Baidu_Inc team enabled multi token prediction for Spare attention, achieving more than 2x decoding throughput improvements for the latest DeepSeek v3.2 models. The new architecture makes inference more interesting—we need to carefully handle the…