elie

@eliebakouch

Training llm's (now: @huggingface) anon feedback: https://www.admonymous.co/eliebakouch

huggingface.co/eliebak

Joined January 2024

4KPosts 9KFollowers 3KFollowing

Pinned

elie

@eliebakouch

Oct 30

Training LLMs end to end is hard. Very excited to share our new blog (book?) that cover the full pipeline: pre-training, post-training and infra. 200+ pages of what worked, what didn’t, and how to make it run reliably huggingface.co/spaces/Hugging…

eliebakouch's tweet image. Training LLMs end to end is hard. Very excited to share our new blog (book?) that cover the full pipeline: pre-training, post-training and infra. 200+ pages of what worked, what didn’t, and how to make it run reliably

huggingface.co/spaces/Hugging…

elie reposted

Fleetwood

@fleetwood___

2 h

Amazing pairing to learn information theory Blog from Olah which gives great visual intuition: colah.github.io/posts/2015-09-… Video from 3b1b where you see the power by solving a real world example, Wordle: youtube.com/watch?v=v68zYy…

fleetwood___'s tweet image. Amazing pairing to learn information theory

Blog from Olah which gives great visual intuition:
colah.github.io/posts/2015-09-…

Video from 3b1b where you see the power by solving a real world example, Wordle: youtube.com/watch?v=v68zYy…

elie reposted

Crystal

@crystalsssup

11 h

Our infra engineer shared a great article “Why Kimi Chose INT4.” I asked if he could be on Twitter, but he’s shy and prefers to be the man behind Kimi. :)

Zhihu Frontier

@ZhihuFrontier

Nov 8

🚀 "Quantization is not a compromise — it's the next paradigm." After K2-Thinking's release, many developers have been curious about its native INT4 quantization format. 刘少伟, infra engineer at @Kimi_Moonshot and Zhihu contributor, shares an insider's view on why this choice…

ZhihuFrontier's tweet image. 🚀 "Quantization is not a compromise — it's the next paradigm."
After K2-Thinking's release, many developers have been curious about its native INT4 quantization format.
刘少伟, infra engineer at @Kimi_Moonshot and Zhihu contributor, shares an insider's view on why this choice…

elie

@eliebakouch

21 h

making meme like this should be a full time job

tender

@tenderizzation

Nov 7

[ENG SUB] how it feels to use eager pytorch in 2025

elie

@eliebakouch

Nov 7

Xeophon

@xeophon_

Nov 7

same model, same settings, just diff providers

elie

@eliebakouch

Nov 7

nevermind.. sorry openai i wasn't familiar with your game

elie

@eliebakouch

Nov 6

imagine openai official account answering "awesome!" on claude sonnet 4.5 release

elie reposted

snow

@snowclipsed

Nov 7

you must feel the gradients flowing through you! (edit: accidentally added shampoo before, sorry @kellerjordan0)

elie

@eliebakouch

Nov 7

I don't think other tech report mention bf16 training / fp8 inference for RL training right?

🐻熊狸

@bigeagle_xd

Nov 7

before int4, it was bf16 training + fp8 inference, so discrepancy is not greater

elie reposted

Tim Duffy

@timfduffy

Nov 6

Interesting that the quantization is applied to the routed experts but not to the shared one. My understanding is that the shared expert has plenty of time to compute (Meituan fit in two whole shared expert layers during one MoE step with ScMoE) which is probably why.

timfduffy's tweet image. Interesting that the quantization is applied to the routed experts but not to the shared one. My understanding is that the shared expert has plenty of time to compute (Meituan fit in two whole shared expert layers during one MoE step with ScMoE) which is probably why.

elie reposted

stochasm

@stochasticchasm

Nov 6

Insane how far the open source frontier has come

Kimi.ai

@Kimi_Moonshot

Nov 6

🚀 Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. 🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%) 🔹 Executes up to 200 – 300 sequential tool calls without human interference 🔹 Excels in reasoning, agentic search, and coding 🔹 256K context window Built…

Kimi_Moonshot's tweet image. 🚀 Hello, Kimi K2 Thinking!
The Open-Source Thinking Agent Model is here.

🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%)
🔹 Executes up to 200 – 300 sequential tool calls without human interference
🔹 Excels in reasoning, agentic search, and coding
🔹 256K context window

Built…

elie

@eliebakouch

Nov 6

> "200-300 sequential tool calls" this is really the impressive part of this release imo, can't wait to see how they did it

Kimi.ai

@Kimi_Moonshot

Nov 6

elie

@eliebakouch

Nov 6

imagine openai official account answering "awesome!" on claude sonnet 4.5 release

Z.ai

@Zai_org

Nov 6

Awesome!

elie

@eliebakouch

Nov 6

the score are insane, very cool to see native int4 quantization for the MoE layers > To overcome this challenge, we adopt Quantization-Aware Training (QAT) during the post-training phase, applying INT4 weight-only quantization to the MoE components. It allows K2 Thinking to…

Kimi.ai

@Kimi_Moonshot

Nov 6

elie reposted

Kimi.ai

@Kimi_Moonshot

Nov 6

elie reposted

Cody Blakeney

@code_star

Nov 6

This feels like a big moment

elie

@eliebakouch

Nov 6

we’re very close to 50% on HLE, and bonus point: it’s with an open model :)

elie

@eliebakouch

Nov 6

ok we're at 51% with "heavy" mode > Heavy Mode: K2 Thinking Heavy Mode employs an efficient parallel strategy: it first rolls out eight trajectories simultaneously, then reflectively aggregates all outputs to generate the final result.