eliebakouch's profile picture. Training llm's (now: @huggingface)
anon feedback: https://www.admonymous.co/eliebakouch

elie

@eliebakouch

Training llm's (now: @huggingface) anon feedback: https://www.admonymous.co/eliebakouch

Pinned

Training LLMs end to end is hard. Very excited to share our new blog (book?) that cover the full pipeline: pre-training, post-training and infra. 200+ pages of what worked, what didn’t, and how to make it run reliably huggingface.co/spaces/Hugging…

eliebakouch's tweet image. Training LLMs end to end is hard. Very excited to share our new blog (book?) that cover the full pipeline: pre-training, post-training and infra. 200+ pages of what worked, what didn’t, and how to make it run reliably

huggingface.co/spaces/Hugging…

Great team/subject 👇

I'm hiring an intern for our distributed optimization team in Google Research, based in Seattle. If you're interested and have relevant experience in DiLoCo-style things, please apply or let me know you've applied. google.com/about/careers/…



elie reposted

Huge thanks to @swyx for inviting us to the @aiDotEngineer Summit — my first time attending, and what an incredible experience. Representing @MiniMax__AI at AIE/LEAD was an honor, and it meant a lot to see so many people resonated with our research and mission in the M-series.…

olive_jy_song's tweet image. Huge thanks to @swyx for inviting us to the @aiDotEngineer Summit — my first time attending, and what an incredible experience. Representing @MiniMax__AI at AIE/LEAD was an honor, and it meant a lot to see so many people resonated with our research and mission in the M-series.…
olive_jy_song's tweet image. Huge thanks to @swyx for inviting us to the @aiDotEngineer Summit — my first time attending, and what an incredible experience. Representing @MiniMax__AI at AIE/LEAD was an honor, and it meant a lot to see so many people resonated with our research and mission in the M-series.…
olive_jy_song's tweet image. Huge thanks to @swyx for inviting us to the @aiDotEngineer Summit — my first time attending, and what an incredible experience. Representing @MiniMax__AI at AIE/LEAD was an honor, and it meant a lot to see so many people resonated with our research and mission in the M-series.…
olive_jy_song's tweet image. Huge thanks to @swyx for inviting us to the @aiDotEngineer Summit — my first time attending, and what an incredible experience. Representing @MiniMax__AI at AIE/LEAD was an honor, and it meant a lot to see so many people resonated with our research and mission in the M-series.…

elie reposted
yzhang_cs's tweet image.

Context Arena Update: Added kimi-linear-48b-a3b-instruct [11-08] and kimi-k2 (Thinking) [11-06] to the MRCR leaderboards. The Linear 48b results are fascinating! It actually outperforms the new Gemini 3.0 Pro Thinking on 4-needle and 8-needle tasks at higher context lengths…

DillonUzar's tweet image. Context Arena Update: Added kimi-linear-48b-a3b-instruct [11-08] and kimi-k2 (Thinking) [11-06] to the MRCR leaderboards.

The Linear 48b results are fascinating! It actually outperforms the new Gemini 3.0 Pro Thinking on 4-needle and 8-needle tasks at higher context lengths…
DillonUzar's tweet image. Context Arena Update: Added kimi-linear-48b-a3b-instruct [11-08] and kimi-k2 (Thinking) [11-06] to the MRCR leaderboards.

The Linear 48b results are fascinating! It actually outperforms the new Gemini 3.0 Pro Thinking on 4-needle and 8-needle tasks at higher context lengths…
DillonUzar's tweet image. Context Arena Update: Added kimi-linear-48b-a3b-instruct [11-08] and kimi-k2 (Thinking) [11-06] to the MRCR leaderboards.

The Linear 48b results are fascinating! It actually outperforms the new Gemini 3.0 Pro Thinking on 4-needle and 8-needle tasks at higher context lengths…
DillonUzar's tweet image. Context Arena Update: Added kimi-linear-48b-a3b-instruct [11-08] and kimi-k2 (Thinking) [11-06] to the MRCR leaderboards.

The Linear 48b results are fascinating! It actually outperforms the new Gemini 3.0 Pro Thinking on 4-needle and 8-needle tasks at higher context lengths…


elie reposted

Hello everyone! This is me, the student. A few weeks ago I learned about ZeRO in @TheZachMueller's awesome course. My PR set a new modded nanogpt record by improving the compute-comms overlap of the distributed Adam implementation. This resulted in a speedup of the Adam step…

One of my students set a new modded nanogpt record (using the stuff learned in class)🔥 github.com/KellerJordan/m…

TheZachMueller's tweet image. One of my students set a new modded nanogpt record (using the stuff learned in class)🔥

github.com/KellerJordan/m…


"look @Grad62304977, they are trying to replace us"

eliebakouch's tweet image. "look @Grad62304977, they are trying to replace us"

Just got out of the waitlist for scholar labs, the new google product for searching/asking info about papers! from my quick vibe test it's very good to find relevant papers on a specific subject, but it won’t get the little gems if it’s not one of the main focuses of the…

eliebakouch's tweet image. Just got out of the waitlist for scholar labs, the new google product for searching/asking info about papers!

from my quick vibe test it's very good to find relevant papers on a specific subject, but it won’t get the little gems if it’s not one of the main focuses of the…


i would take this kind of rumor VERY lightly. one obvious reason is that i’d like to think the people cooking at tbd don’t have the time or willingness to engage with this kind of post on blind

I refuse to believe to be real if it is real, though, it perfectly explains the issue Meta can't into modern machine learning because the whole organization has a learning disability.

teortaxesTex's tweet image. I refuse to believe to be real
if it is real, though, it perfectly explains the issue
Meta can't into modern machine learning because the whole organization has a learning disability.


huge if true

eliebakouch's tweet image. huge if true

>wake up with sore throat

kalomaze's tweet image. >wake up with sore throat


Release it and keep training ☺️

if you have a large scale RL run going and your model is already SOTA for its size but the evals are still going up, what would you do? asking for a friend



elie reposted

interesting how Gemini has 0 guards wrt celeb generation nowadays. Perceived risk acceptance has really changed a lot those past 2 years

julien_c's tweet image. interesting how Gemini has 0 guards wrt celeb generation nowadays.

Perceived risk acceptance has really changed a lot those past 2 years

elie reposted

Thread of appreciation for a few of the students and interns that made Olmo 3 special (just the ones i was fortunate to work with! all @allen_ai interns are great!!) 🧵


elie reposted

There is so much cool science to do on long context data research. Its basically never covered in tech reports other than the engineering efforts to solve sequence parallelism. Once again @allen_ai doing the lords work telling us the juicy details of what works and what doesn't

this is so good, taking my time to read this tech report like a good movie

eliebakouch's tweet image. this is so good, taking my time to read this tech report like a good movie
eliebakouch's tweet image. this is so good, taking my time to read this tech report like a good movie
eliebakouch's tweet image. this is so good, taking my time to read this tech report like a good movie
eliebakouch's tweet image. this is so good, taking my time to read this tech report like a good movie


this is so good, taking my time to read this tech report like a good movie

eliebakouch's tweet image. this is so good, taking my time to read this tech report like a good movie
eliebakouch's tweet image. this is so good, taking my time to read this tech report like a good movie
eliebakouch's tweet image. this is so good, taking my time to read this tech report like a good movie
eliebakouch's tweet image. this is so good, taking my time to read this tech report like a good movie

happy olmo day for those who celebrate!!!

eliebakouch's tweet image. happy olmo day for those who celebrate!!!


elie reposted

Muon Optimizer Guide: Quick Start & Key Details kexue.fm/archives/11416


elie reposted

Today's the day. Olmo 3 is out in the open 🐮 🦖 I've had a wonderful time working on this model. I focused on post-training Olmo 3 to write code. I wrote up some of my thoughts on my blog here: open.substack.com/pub/learnycurv…

Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey. Best fully open 32B reasoning model & best 32B base model. 🧵

allen_ai's tweet image. Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey.
Best fully open 32B reasoning model & best 32B base model. 🧵


elie reposted

Incredibly proud of the OLMo team! Alongside the new model releases, there’s a wealth of material for the community: a full research report detailing the entire training process, the Dolma 3 dataset, all intermediate checkpoints, and the complete training scripts.

Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey. Best fully open 32B reasoning model & best 32B base model. 🧵

allen_ai's tweet image. Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey.
Best fully open 32B reasoning model & best 32B base model. 🧵


United States Trends

Loading...

Something went wrong.


Something went wrong.