Yu Zhang 🐈🐙

@yzhang_cs

@Kimi_Moonshot; PhD Student @ Soochow University; working on efficient methods for LLMs; disciple of parallel programming; INTP

yzhang.site

於二月 2023 加入

777貼文 2千位跟隨者 704個跟隨中

你可能會喜歡

@eusondavid

Yu Zhang 🐈🐙 已轉發

jianlin.su

@Jianlin_S

6 小時

Weight Decay and Learning Rate from a EMA View kexue.fm/archives/11459 Derives optimal WD and LR schedules from this perspective.

Yu Zhang 🐈🐙 已轉發

張小珺 Xiaojùn

@zhang_benita

年12月4日

语言即世界工作室和微博合作了一档全新的节目《未竟之约》。在变化的时空里，表达与观察是没有起点和终点的，从此刻出发，一起走向未竟的旅程🎧🎧🎧

Yu Zhang 🐈🐙 已轉發

Shilong Liu

@atasteoff

18 小時

#NeurIPS2025

Yu Zhang 🐈🐙 已轉發

Yifan Zhang @ NeurIPS

@yifan_zhang_

20 小時

Hope you enjoyed yesterday’s poster! We were honored to have @SonglinYang4, @Xinyu2ML, @wen_kaiyue, and many other esteemed researchers visit and share their guidance! 🚀

yifan_zhang_'s tweet image. Hope you enjoyed yesterday’s poster! We were honored to have @SonglinYang4, @Xinyu2ML, @wen_kaiyue, and many other esteemed researchers visit and share their guidance! 🚀

Jiankui He

@Jiankui_He

年12月3日

I am the greatest scientist in China in the last 100 years.

Yu Zhang 🐈🐙 已轉發

I will talk about recent developments of linear attention, like GLA, deltanet, GDN, and KDA. The application of linear attention to latest frontier models like Qwen 3 next and kimi linear. It’s strong coherence with test time learning. And the exciting future ahead! Also Looking…

Delta Institute @ NeurIPS

@DeltaInstitutes

年12月2日

We're super excited to host our fourth NeurIPS reading group with @Yikang_Shen, including lunch sponsored by @StrikerVP! We'll gather 25-30 researchers to discuss Linear Attention and DeltaNet, including their usage in recent models like Qwen3-Next and Kimi Linear, RSVP below!

DeltaInstitutes's tweet image. We're super excited to host our fourth NeurIPS reading group with @Yikang_Shen, including lunch sponsored by @StrikerVP!

We'll gather 25-30 researchers to discuss Linear Attention and DeltaNet, including their usage in recent models like Qwen3-Next and Kimi Linear, RSVP below!

Yu Zhang 🐈🐙 已轉發

YIFENG LIU

@YIFENGLIU_AI

年12月1日

I will go to NeurIPS 2025@San Diego during Dec. 2-7 for my spotlight paper "Tensor product attention is all you need", and I'm also excited to meet all of you there to discuss anything interesting, exciting and enlightening about AI, LLMs and next trend of innovation.

YIFENGLIU_AI's tweet image. I will go to NeurIPS 2025@San Diego during Dec. 2-7 for my spotlight paper "Tensor product attention is all you need", and I'm also excited to meet all of you there to discuss anything interesting, exciting and enlightening about AI, LLMs and next trend of innovation.

Yu Zhang 🐈🐙 已轉發

stochasm

@stochasticchasm

年12月1日

Details on the trinity architecture that we settled on!

Yu Zhang 🐈🐙 已轉發

Yi Ma

@YiMaTweets

年12月2日

Open source is probably the new or final form of publication for academia.

Yu Zhang 🐈🐙 已轉發

Zhibin Gou

@zebgou

年12月1日

If Gemini-3 proved continual scaling pretraining, DeepSeek-V3.2-Speciale proves scaling RL with large context. We spent a year pushing DeepSeek-V3 to its limits. The lesson is post-training bottlenecks are solved by refining methods and data, not just waiting for a better base.

DeepSeek

@deepseek_ai

年12月1日

🚀 Launching DeepSeek-V3.2 & DeepSeek-V3.2-Speciale — Reasoning-first models built for agents! 🔹 DeepSeek-V3.2: Official successor to V3.2-Exp. Now live on App, Web & API. 🔹 DeepSeek-V3.2-Speciale: Pushing the boundaries of reasoning capabilities. API-only for now. 📄 Tech…

deepseek_ai's tweet image. 🚀 Launching DeepSeek-V3.2 &amp; DeepSeek-V3.2-Speciale — Reasoning-first models built for agents!

🔹 DeepSeek-V3.2: Official successor to V3.2-Exp. Now live on App, Web &amp; API.
🔹 DeepSeek-V3.2-Speciale: Pushing the boundaries of reasoning capabilities. API-only for now.

📄 Tech…

Yu Zhang 🐈🐙 已轉發

DeepSeek

@deepseek_ai

年12月1日

Yu Zhang 🐈🐙 已轉發

Victor M

@victormustar

年12月1日

let's appreciate this model name :)

Yu Zhang 🐈🐙 已轉發

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

@teortaxesTex

年12月1日

enough preamble. The most important part in every Whale paper, as I've said so many times over these years, is “Conclusion, Limitation, and Future Work”. They say: Frontier has no knowledge advantage. Compute is the only serious differentiator left. Time to get more GPUs.

teortaxesTex's tweet image. enough preamble. The most important part in every Whale paper, as I've said so many times over these years, is “Conclusion, Limitation, and Future Work”.
They say: Frontier has no knowledge advantage. Compute is the only serious differentiator left. Time to get more GPUs.

Yu Zhang 🐈🐙

@yzhang_cs

年12月1日

感觉模型大小还是很本质的今天被几米奶老师震撼到了，几乎无错地完成了KDA并行形式的推导orz

Yu Zhang 🐈🐙 已轉發

Yuandong Tian

@tydsh

年11月30日

Congrats to @Alibaba_Qwen! Great to hear that our Attention Sink study 2 years ago leads to strong architecture improvement and more stable model training😀

Qwen

@Alibaba_Qwen

年11月27日

🏆 We are incredibly honored to announce that our paper, "Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free" has received the NeurIPS 2025 Best Paper Award! A huge congratulations to our dedicated research team for pushing the boundaries…

Alibaba_Qwen's tweet image. 🏆 We are incredibly honored to announce that our paper, "Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free" has received the NeurIPS 2025 Best Paper Award!

A huge congratulations to our dedicated research team for pushing the boundaries…

Yu Zhang 🐈🐙 已轉發

DailyPapers

@HuggingPapers

年11月30日

Tencent YouTu Lab just dropped SSA: Sparse Sparse Attention for efficient LLM processing This new framework for long-context inference achieves state-of-the-art by explicitly encouraging sparser attention distributions, outperforming existing methods in perplexity across huge…

HuggingPapers's tweet image. Tencent YouTu Lab just dropped SSA: Sparse Sparse Attention for efficient LLM processing

This new framework for long-context inference achieves state-of-the-art by explicitly encouraging sparser attention distributions, outperforming existing methods in perplexity across huge…

Yu Zhang 🐈🐙

@yzhang_cs

年11月29日

Sleep at 6:00 a.m. 👻

World of Statistics

@stats_feed

年11月29日

Healthy sleep schedules: Sleep at 8:00PM → Wake up at 3:30AM Sleep at 8:30PM → Wake up at 4:00AM Sleep at 9:00PM → Wake up at 4:30AM Sleep at 9:30PM → Wake up at 5:00AM Sleep at 10:00PM → Wake up at 5:30AM Sleep at 10:30PM → Wake up at 6:00AM Sleep at 11:00PM → Wake up at…

Yu Zhang 🐈🐙 已轉發

Maximilian Beck✈️NeurIPS‘25

@maxmbeck

年11月28日

Interested in how we can use ideas from Flash Attention for more efficient linear RNN kernels? I am heading to NeurIPS in San Diego to present our work on Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels.

maxmbeck's tweet image. Interested in how we can use ideas from Flash Attention for more efficient linear RNN kernels?

I am heading to NeurIPS in San Diego to present our work on Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels.

Yu Zhang 🐈🐙 已轉發

Andreas Kirsch 🇺🇦

@BlackHC

年11月28日

Lol what a shitshow @iclr_conf I'm sure the new ACs will take the rebuttals into account in a meaningful way when they decide to keep the original scores. What a waste of effort for everyone who spent time on rebuttals, and what a stupid reaction to the leak 🤦

BlackHC's tweet image. Lol what a shitshow @iclr_conf

I'm sure the new ACs will take the rebuttals into account in a meaningful way when they decide to keep the original scores. What a waste of effort for everyone who spent time on rebuttals, and what a stupid reaction to the leak 🤦

Yu Zhang 🐈🐙 已轉發

Tianqi Chen

@tqchenml

年11月28日

CuteDSL 4.3.1 is here 🚀 Major host overhead optimization (10-40µs down to a 2µs in hot loops_, streamlined PyTorch interop (pass torch.Tensors directly, no more conversions needed) and export and use in more languages and envs. All powered by apache tvm-ffi ABI