elie

@eliebakouch

Training llm's (now: @huggingface) anon feedback: https://www.admonymous.co/eliebakouch

huggingface.co/eliebak

Joined January 2024

4KPosts 9KFollowers 3KFollowing

Pinned

elie

@eliebakouch

Oct 30

Training LLMs end to end is hard. Very excited to share our new blog (book?) that cover the full pipeline: pre-training, post-training and infra. 200+ pages of what worked, what didn’t, and how to make it run reliably huggingface.co/spaces/Hugging…

elie

@eliebakouch

Nov 24

Great team/subject 👇

Zachary Charles

@MatharyCharles

Nov 24

I'm hiring an intern for our distributed optimization team in Google Research, based in Seattle. If you're interested and have relevant experience in DiLoCo-style things, please apply or let me know you've applied. google.com/about/careers/…

elie reposted

Olive Song

@olive_jy_song

Nov 23

Huge thanks to @swyx for inviting us to the @aiDotEngineer Summit — my first time attending, and what an incredible experience. Representing @MiniMax__AI at AIE/LEAD was an honor, and it meant a lot to see so many people resonated with our research and mission in the M-series.…

olive_jy_song's tweet image. Huge thanks to @swyx for inviting us to the @aiDotEngineer Summit — my first time attending, and what an incredible experience. Representing @MiniMax__AI at AIE/LEAD was an honor, and it meant a lot to see so many people resonated with our research and mission in the M-series.…

elie reposted

Yu Zhang 🐈🐙

@yzhang_cs

Nov 23

Dillon Uzar

@DillonUzar

Nov 22

Context Arena Update: Added kimi-linear-48b-a3b-instruct [11-08] and kimi-k2 (Thinking) [11-06] to the MRCR leaderboards. The Linear 48b results are fascinating! It actually outperforms the new Gemini 3.0 Pro Thinking on 4-needle and 8-needle tasks at higher context lengths…

DillonUzar's tweet image. Context Arena Update: Added kimi-linear-48b-a3b-instruct [11-08] and kimi-k2 (Thinking) [11-06] to the MRCR leaderboards.

The Linear 48b results are fascinating! It actually outperforms the new Gemini 3.0 Pro Thinking on 4-needle and 8-needle tasks at higher context lengths…

elie reposted

underfit

@underfitai

Nov 21

Hello everyone! This is me, the student. A few weeks ago I learned about ZeRO in @TheZachMueller's awesome course. My PR set a new modded nanogpt record by improving the compute-comms overlap of the distributed Adam implementation. This resulted in a speedup of the Adam step…

Zach Mueller @ Neurips

@TheZachMueller

Nov 17

One of my students set a new modded nanogpt record (using the stuff learned in class)🔥 github.com/KellerJordan/m…

elie

@eliebakouch

Nov 22

"look @Grad62304977, they are trying to replace us"

elie

@eliebakouch

Nov 22

Just got out of the waitlist for scholar labs, the new google product for searching/asking info about papers! from my quick vibe test it's very good to find relevant papers on a specific subject, but it won’t get the little gems if it’s not one of the main focuses of the…

elie

@eliebakouch

Nov 21

i would take this kind of rumor VERY lightly. one obvious reason is that i’d like to think the people cooking at tbd don’t have the time or willingness to engage with this kind of post on blind

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

@teortaxesTex

Nov 21

I refuse to believe to be real if it is real, though, it perfectly explains the issue Meta can't into modern machine learning because the whole organization has a learning disability.

elie

@eliebakouch

Nov 21

huge if true

kalomaze

@kalomaze

Nov 21

>wake up with sore throat

elie

@eliebakouch

Nov 21

Release it and keep training ☺️

Johannes Hagemann

@johannes_hage

Nov 21

if you have a large scale RL run going and your model is already SOTA for its size but the evals are still going up, what would you do? asking for a friend

elie reposted

Julien Chaumond

@julien_c

Nov 20

interesting how Gemini has 0 guards wrt celeb generation nowadays. Perceived risk acceptance has really changed a lot those past 2 years

julien_c's tweet image. interesting how Gemini has 0 guards wrt celeb generation nowadays.

Perceived risk acceptance has really changed a lot those past 2 years

elie reposted

Luca Soldaini 🌯 NeurIPS 2025

@soldni

Nov 20

Thread of appreciation for a few of the students and interns that made Olmo 3 special (just the ones i was fortunate to work with! all @allen_ai interns are great!!) 🧵

elie reposted

Cody Blakeney

@code_star

Nov 20

There is so much cool science to do on long context data research. Its basically never covered in tech reports other than the engineering efforts to solve sequence parallelism. Once again @allen_ai doing the lords work telling us the juicy details of what works and what doesn't

elie

@eliebakouch

Nov 20

this is so good, taking my time to read this tech report like a good movie

elie

@eliebakouch

Nov 20

this is so good, taking my time to read this tech report like a good movie

elie

@eliebakouch

Nov 20

happy olmo day for those who celebrate!!!

elie reposted

jianlin.su

@Jianlin_S

Nov 19

Muon Optimizer Guide: Quick Start & Key Details kexue.fm/archives/11416

elie reposted

Saurabh Shah @ neurips 🌊

@saurabh_shah2

Nov 20

Today's the day. Olmo 3 is out in the open 🐮 🦖 I've had a wonderful time working on this model. I focused on post-training Olmo 3 to write code. I wrote up some of my thoughts on my blog here: open.substack.com/pub/learnycurv…

Ai2

@allen_ai

Nov 20

Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey. Best fully open 32B reasoning model & best 32B base model. 🧵

allen_ai's tweet image. Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, &amp; tool use, and an open model flow—not just the final weights, but the entire training journey.
Best fully open 32B reasoning model &amp; best 32B base model. 🧵

elie reposted

Tyler Romero

@tyleraromero

Nov 20

Incredibly proud of the OLMo team! Alongside the new model releases, there’s a wealth of material for the community: a full research report detailing the entire training process, the Dolma 3 dataset, all intermediate checkpoints, and the complete training scripts.