elie
@eliebakouch
Training llm's (now: @huggingface) anon feedback: https://www.admonymous.co/eliebakouch
Training LLMs end to end is hard. Very excited to share our new blog (book?) that cover the full pipeline: pre-training, post-training and infra. 200+ pages of what worked, what didn’t, and how to make it run reliably huggingface.co/spaces/Hugging…
Amazing pairing to learn information theory Blog from Olah which gives great visual intuition: colah.github.io/posts/2015-09-… Video from 3b1b where you see the power by solving a real world example, Wordle: youtube.com/watch?v=v68zYy…
Our infra engineer shared a great article “Why Kimi Chose INT4.” I asked if he could be on Twitter, but he’s shy and prefers to be the man behind Kimi. :)
🚀 "Quantization is not a compromise — it's the next paradigm." After K2-Thinking's release, many developers have been curious about its native INT4 quantization format. 刘少伟, infra engineer at @Kimi_Moonshot and Zhihu contributor, shares an insider's view on why this choice…
making meme like this should be a full time job
nevermind.. sorry openai i wasn't familiar with your game
imagine openai official account answering "awesome!" on claude sonnet 4.5 release
you must feel the gradients flowing through you! (edit: accidentally added shampoo before, sorry @kellerjordan0)
I don't think other tech report mention bf16 training / fp8 inference for RL training right?
before int4, it was bf16 training + fp8 inference, so discrepancy is not greater
Interesting that the quantization is applied to the routed experts but not to the shared one. My understanding is that the shared expert has plenty of time to compute (Meituan fit in two whole shared expert layers during one MoE step with ScMoE) which is probably why.
Insane how far the open source frontier has come
🚀 Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. 🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%) 🔹 Executes up to 200 – 300 sequential tool calls without human interference 🔹 Excels in reasoning, agentic search, and coding 🔹 256K context window Built…
> "200-300 sequential tool calls" this is really the impressive part of this release imo, can't wait to see how they did it
🚀 Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. 🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%) 🔹 Executes up to 200 – 300 sequential tool calls without human interference 🔹 Excels in reasoning, agentic search, and coding 🔹 256K context window Built…
imagine openai official account answering "awesome!" on claude sonnet 4.5 release
the score are insane, very cool to see native int4 quantization for the MoE layers > To overcome this challenge, we adopt Quantization-Aware Training (QAT) during the post-training phase, applying INT4 weight-only quantization to the MoE components. It allows K2 Thinking to…
🚀 Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. 🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%) 🔹 Executes up to 200 – 300 sequential tool calls without human interference 🔹 Excels in reasoning, agentic search, and coding 🔹 256K context window Built…
🚀 Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. 🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%) 🔹 Executes up to 200 – 300 sequential tool calls without human interference 🔹 Excels in reasoning, agentic search, and coding 🔹 256K context window Built…
This feels like a big moment
we’re very close to 50% on HLE, and bonus point: it’s with an open model :)
ok we're at 51% with "heavy" mode > Heavy Mode: K2 Thinking Heavy Mode employs an efficient parallel strategy: it first rolls out eight trajectories simultaneously, then reflectively aggregates all outputs to generate the final result.
we’re very close to 50% on HLE, and bonus point: it’s with an open model :)
we’re very close to 50% on HLE, and bonus point: it’s with an open model :)
United States Trends
- 1. Drake London 1,844 posts
- 2. Falcons 12.2K posts
- 3. Max B 11.6K posts
- 4. Alec Pierce 1,956 posts
- 5. Kyle Pitts 1,142 posts
- 6. Raheem Morris N/A
- 7. #Colts 2,379 posts
- 8. Penix 2,387 posts
- 9. Bijan 2,352 posts
- 10. Badgley N/A
- 11. $SENS $0.70 Senseonics CGM N/A
- 12. #ForTheShoe 1,401 posts
- 13. Zac Robinson N/A
- 14. $LMT $450.50 Lockheed F-35 N/A
- 15. $APDN $0.20 Applied DNA N/A
- 16. Good Sunday 75.4K posts
- 17. #Talus_Labs N/A
- 18. #DirtyBirds N/A
- 19. #AskFFT N/A
- 20. Jessie Bates N/A
Something went wrong.
Something went wrong.