elie
@eliebakouch
Training llm's (now: @huggingface) anon feedback: https://www.admonymous.co/eliebakouch
Training LLMs end to end is hard. Very excited to share our new blog (book?) that cover the full pipeline: pre-training, post-training and infra. 200+ pages of what worked, what didn’t, and how to make it run reliably huggingface.co/spaces/Hugging…
Great team/subject 👇
I'm hiring an intern for our distributed optimization team in Google Research, based in Seattle. If you're interested and have relevant experience in DiLoCo-style things, please apply or let me know you've applied. google.com/about/careers/…
Huge thanks to @swyx for inviting us to the @aiDotEngineer Summit — my first time attending, and what an incredible experience. Representing @MiniMax__AI at AIE/LEAD was an honor, and it meant a lot to see so many people resonated with our research and mission in the M-series.…
Context Arena Update: Added kimi-linear-48b-a3b-instruct [11-08] and kimi-k2 (Thinking) [11-06] to the MRCR leaderboards. The Linear 48b results are fascinating! It actually outperforms the new Gemini 3.0 Pro Thinking on 4-needle and 8-needle tasks at higher context lengths…
Hello everyone! This is me, the student. A few weeks ago I learned about ZeRO in @TheZachMueller's awesome course. My PR set a new modded nanogpt record by improving the compute-comms overlap of the distributed Adam implementation. This resulted in a speedup of the Adam step…
One of my students set a new modded nanogpt record (using the stuff learned in class)🔥 github.com/KellerJordan/m…
"look @Grad62304977, they are trying to replace us"
Just got out of the waitlist for scholar labs, the new google product for searching/asking info about papers! from my quick vibe test it's very good to find relevant papers on a specific subject, but it won’t get the little gems if it’s not one of the main focuses of the…
i would take this kind of rumor VERY lightly. one obvious reason is that i’d like to think the people cooking at tbd don’t have the time or willingness to engage with this kind of post on blind
I refuse to believe to be real if it is real, though, it perfectly explains the issue Meta can't into modern machine learning because the whole organization has a learning disability.
huge if true
Release it and keep training ☺️
if you have a large scale RL run going and your model is already SOTA for its size but the evals are still going up, what would you do? asking for a friend
interesting how Gemini has 0 guards wrt celeb generation nowadays. Perceived risk acceptance has really changed a lot those past 2 years
Thread of appreciation for a few of the students and interns that made Olmo 3 special (just the ones i was fortunate to work with! all @allen_ai interns are great!!) 🧵
There is so much cool science to do on long context data research. Its basically never covered in tech reports other than the engineering efforts to solve sequence parallelism. Once again @allen_ai doing the lords work telling us the juicy details of what works and what doesn't
this is so good, taking my time to read this tech report like a good movie
this is so good, taking my time to read this tech report like a good movie
Muon Optimizer Guide: Quick Start & Key Details kexue.fm/archives/11416
Today's the day. Olmo 3 is out in the open 🐮 🦖 I've had a wonderful time working on this model. I focused on post-training Olmo 3 to write code. I wrote up some of my thoughts on my blog here: open.substack.com/pub/learnycurv…
Incredibly proud of the OLMo team! Alongside the new model releases, there’s a wealth of material for the community: a full research report detailing the entire training process, the Dolma 3 dataset, all intermediate checkpoints, and the complete training scripts.
United States Trends
- 1. Thanksgiving 362K posts
- 2. Golesh 1,836 posts
- 3. Fani Willis 10.6K posts
- 4. Trumplican 2,657 posts
- 5. Hong Kong 75.7K posts
- 6. Khabib 6,020 posts
- 7. Stranger Things 155K posts
- 8. Riker N/A
- 9. #TejRan 4,159 posts
- 10. Ruth 13.3K posts
- 11. Elijah Moore N/A
- 12. Tom Hardy 1,064 posts
- 13. Pete Skandalakis N/A
- 14. NextNRG Inc N/A
- 15. #sstvi 47.2K posts
- 16. #Wednesdayvibe 3,526 posts
- 17. Nuns 10.4K posts
- 18. Karoline Leavitt 27.4K posts
- 19. Idris 7,039 posts
- 20. #wednesdaymotivation 6,600 posts
Something went wrong.
Something went wrong.