Fusheng Liu

@mathlfs

PhD student @ National University of Singapore [email protected]

Singapore

mathematicallfs.github.io

9월 2022에 가입

51게시물 19팔로워 100팔로우 중

Fusheng Liu 님이 재게시함

Kimi.ai

. 2. 22.

🚀 Introducing our new tech report: Muon is Scalable for LLM Training We found that Muon optimizer can be scaled up using the follow techniques: • Adding weight decay • Carefully adjusting the per-parameter update scale ✨ Highlights: • ~2x computational efficiency vs AdamW…

Kimi_Moonshot's tweet image. 🚀 Introducing our new tech report: Muon is Scalable for LLM Training

We found that Muon optimizer can be scaled up using the follow techniques:
• Adding weight decay
• Carefully adjusting the per-parameter update scale

✨ Highlights:
• ~2x computational efficiency vs AdamW…

Kimi_Moonshot's tweet image. 🚀 Introducing our new tech report: Muon is Scalable for LLM Training

We found that Muon optimizer can be scaled up using the follow techniques:
• Adding weight decay
• Carefully adjusting the per-parameter update scale

✨ Highlights:
• ~2x computational efficiency vs AdamW…

Fusheng Liu 님이 재게시함

DeepSeek

. 12. 26.

🚀 Introducing DeepSeek-V3! Biggest leap forward yet: ⚡ 60 tokens/second (3x faster than V2!) 💪 Enhanced capabilities 🛠 API compatibility intact 🌍 Fully open-source models & papers 🐋 1/n

deepseek_ai's tweet image. 🚀 Introducing DeepSeek-V3!

Biggest leap forward yet:
⚡ 60 tokens/second (3x faster than V2!)
💪 Enhanced capabilities
🛠 API compatibility intact
🌍 Fully open-source models &amp; papers

🐋 1/n

Fusheng Liu

2024. 10. 17.

Highly recommend this user-friendly project if you start with LM pretraining and want to build your own model/optimizer. The repo is easy to understand, easy to edit and easy to implement new ideas with minimum workloads. Well done Keller! Looking forward to your records on VIT:)

Keller Jordan

2024. 10. 17.

I enjoy getting NanoGPT training speed records. I’m also interested in making my formulation of NanoGPT speedrunning an accessible benchmark on which other people find it easy to try new ideas. To that end, I have tried to keep the code of the current record short, and minimize…

Fusheng Liu 님이 재게시함

Daniel Han

2024. 10. 15.

Fixed a bug which caused all training losses to diverge for large gradient accumulation sizes. 1. First reported by @bnjmn_marie, GA is supposed to be mathematically equivalent to full batch training, but losses did not match. 2. We reproed the issue, and further investigation…

danielhanchen's tweet image. Fixed a bug which caused all training losses to diverge for large gradient accumulation sizes.

1. First reported by @bnjmn_marie, GA is supposed to be mathematically equivalent to full batch training, but losses did not match.
2. We reproed the issue, and further investigation…

Fusheng Liu

2024. 5. 10.

Mamba at ICLR :)

mathlfs's tweet image. Mamba at ICLR :)

Fusheng Liu 님이 재게시함

Taco Cohen

2023. 7. 15.

Harm's Law of Smol Models (HLSM) tells us how much we need to scale up the data size (k_D) as we scale down the model size (k_N), if we wish to preserve the loss of a Chinchilla-optimal model. harmdevries.com/post/model-siz…

TacoCohen's tweet image. Harm's Law of Smol Models (HLSM) tells us how much we need to scale up the data size (k_D) as we scale down the model size (k_N), if we wish to preserve the loss of a Chinchilla-optimal model.
harmdevries.com/post/model-siz…

Fusheng Liu 님이 재게시함

Francis Bach

2022. 11. 16.

Afraid of sum-of-squares (SOS) relaxations? Read this new blog post for a smooth ride in the Fourier domain. francisbach.com/sums-of-square…

Annan Yu

@annan_yu34773

Quentin Berthet

@qberthet

Julien Siems

@julien_siems

Jing Zhang

@JingZhang133376

𝔻𝕃

@dlb07

leloy!

@leloykun

Brian Formento

@Brian__Formento

Brandon Amos

@brandondamos

Du Mingzhe

@spirit__song

Yingnan Liu

@yninthefuture

Zhiyuan

@ZhiyuanCS

Taco Cohen

@TacoCohen

Alan Jeffares @ ICML 🇨🇦

@Jeffaresalan

Claudio Gallicchio

@claudiogallicc1

Tan Minh Nguyen

@TanNguyen689

William de Vazelhes

@WilldVaz

Bingchen Wang

@BingchenWang1

Ayaka

@ayaka14732

Rough Researcher

@SapkotaTsuman

Danqi Chen

@danqi_chen

Roger Grosse

@RogerGrosse

Jonathan Lee

@jon_lee0

Quoc Le

@quocleix

Thang Luong

@lmthang

Weizhu Chen

@WeizhuChen

Fuzhao Xue (Frio)

@XueFz

Shai Shalev-Shwartz

@shai_s_shwartz

Shengjia Zhao

@shengjia_zhao

Andrew Gordon Wilson

@andrewgwils

$ninoscherrer's profile picture. Research Scientist at @Google | Ex: {@PatronusAI, @VectorInst, @Mila_Quebec, @MPI_IS, @ETH_en}$

Nino Scherrer

@ninoscherrer

Johannes Oswald

@oswaldjoh

Shawn Tan

@tanshawn

Tairan He

@TairanHe99

Noam Shazeer

@NoamShazeer

Quentin Berthet

@qberthet

Kaifeng Lyu

@vfleaking

Riccardo Grazzi

@riccardograzzi

Julien Siems

@julien_siems

Nazneen Rajani ✈️NeurIPS '25 ✈️

@nazneenrajani

jianlin.su

@Jianlin_S

Nancy Pelosi Stock Tracker ♟

@pelositracker

Noam Razin

@noamrazin

Jing Zhang

@JingZhang133376

DeepSeek

@deepseek_ai

Jimmy Smith

@jimmysmith1919

Scott Linderman

@scott_linderman

leloy!

@leloykun

Bryan Hooi

@BryanHooi1

Brian Formento

@Brian__Formento

Nicola Cancedda

@nicola_cancedda

Keller Jordan

@kellerjordan0

Brandon Amos

@brandondamos

Du Mingzhe

@spirit__song

Yingnan Liu

@yninthefuture

Institute of Data Science (NUS)

@NUS_IDS

Zhiyuan

@ZhiyuanCS

Taco Cohen

@TacoCohen

Sasha Rush

@srush_nlp

Simons Institute for the Theory of Computing

@SimonsInstitute

Alexander Wei

@alexwei_

Daniela Massiceti

@dannimassi

Kevin Li

@kevinyli_

Aviv Bick

@avivbick

Alan Jeffares @ ICML 🇨🇦

@Jeffaresalan

Andrej Karpathy

@karpathy

William Merrill

@lambdaviking

Gavin Uberti

@UbertiGavin

Ilya Sutskever

@ilyasut

Zhiyuan Li

@zhiyuanli_

United States 트렌드

1. Thanksgiving 2.06M posts
2. Jack White 5,764 posts
3. Packers 39.8K posts
4. Dan Campbell 2,472 posts
5. #GoPackGo 6,364 posts
6. Jordan Love 6,882 posts
7. Watson 12K posts
8. Goff 6,357 posts
9. #GBvsDET 3,238 posts
10. Thankful 411K posts
11. #OnePride 5,823 posts
12. Wicks 4,250 posts
13. Gibbs 7,129 posts
14. Jameson Williams 1,749 posts
15. Turkey 263K posts
16. Green Bay 6,140 posts
17. Tom Kennedy 1,061 posts
18. Jamo 3,337 posts
19. Amon Ra 2,588 posts
20. Seven Nation Army N/A

Something went wrong.

Something went wrong.