Cody Blakeney

@code_star

Science & Technology

Redwood City, CA

Ağustos 2011’de katıldı

19BGönderiler 6BTakipçiler 2BTakip edilenler

Bunları beğenebilirsin

@ml_hardware

@tri_dao

@jefrankle

@sam_havens

@janleike

@leavittron

@SharonYixuanLi

@OfirPress

@vitaliychiley

@mvpatel2000

@mcarbin

@tomgoldsteincs

@labmlai

@m__dehghani

@_arohan_

Sabitlenmiş

Cody Blakeney

@code_star

15 Kas

I've got something new for everyone. My first substack article! Not the one I planned to do first, but a fun one! I have made a handy calculator base on the DeepSeek v1 coefficients for finding optimal LR and batch sizes for dense LLMs.

code_star's tweet image. I've got something new for everyone. My first substack article! Not the one I planned to do first, but a fun one!

I have made a handy calculator base on the DeepSeek v1 coefficients for finding optimal LR and batch sizes for dense LLMs.

Cody Blakeney gönderiyi yeniden yayınladı

tender

@tenderizzation

11 sa

torch main source build

stochasm

@stochasticchasm

11 sa

TORCH NIGHTLY

Cody Blakeney gönderiyi yeniden yayınladı

stochasm

@stochasticchasm

5 sa

binary searching nightlies is a hell of a feeling

stochasm

@stochasticchasm

11 sa

TORCH NIGHTLY

Cody Blakeney gönderiyi yeniden yayınladı

stochasm

@stochasticchasm

11 sa

TORCH NIGHTLY

Cody Blakeney gönderiyi yeniden yayınladı

Cody Blakeney

@code_star

19 sa

lol dumb LLM data podcast idea “Talking Tokens”

Cody Blakeney

@code_star

12 sa

gotta give the people what they want

Cameron R. Wolfe, Ph.D.

@cwolferesearch

21 sa

Incredibly interested. The world deserves this

Cody Blakeney gönderiyi yeniden yayınladı

Cody Blakeney

@code_star

19 sa

Nothing mid about this training 😤

Alexander Doria

@Dorialexander

23 sa

Since mid-training is eating them both, we can as well call it training.

Cody Blakeney gönderiyi yeniden yayınladı

Cody Blakeney

@code_star

20 sa

I guess what I’m getting at here is sparsity performance is an engineering problem and the science is pretty clear that you can make the models big without change the theoretical inference performance. Google has really good engineers. It doesn’t really seem like scaling…

Cody Blakeney

@code_star

20 Kas

Can I ask a dumb question. Let’s say it is 7.5T total parameters. Now that super sparse MoEs are the norm … who cares how big the parameters get? 8x more total params than kimi shouldn’t be hard or surprising for one of the worlds best capitalized companies. 15T next year…

Cody Blakeney gönderiyi yeniden yayınladı

Cody Blakeney

@code_star

21 sa

Canonically I believe this is Olmo: Tokyo Drift

Luca Soldaini 🎀

@soldni

23 sa

This release has SO MUCH • New pretrain corpus, new midtrain data, 380B+ long context tokens • 7B & 32B, Base, Instruct, Think, RL Zero • Close to Qwen 3 performance, but fully open!!

soldni's tweet image. This release has SO MUCH

• New pretrain corpus, new midtrain data, 380B+ long context tokens
• 7B &amp; 32B, Base, Instruct, Think, RL Zero
• Close to Qwen 3 performance, but fully open!!

Cody Blakeney gönderiyi yeniden yayınladı

Cody Blakeney

@code_star

20 sa

Olmo 3: Rawr XD

Luca Soldaini 🎀

@soldni

22 sa

team Rawr 🫡

Cody Blakeney

@code_star

13 sa

Not for nothing the Nemotron Nano 2 paper also had one of these cool untalked about facts which lead me to make this awesome Claude Shannon meme for a slide once. You need to train at least 2x your desired effective sequence length to get good performance.

code_star's tweet image. Not for nothing the Nemotron Nano 2 paper also had one of these cool untalked about facts which lead me to make this awesome Claude Shannon meme for a slide once. You need to train at least 2x your desired effective sequence length to get good performance.

Cody Blakeney

@code_star

14 sa

There is so much cool science to do on long context data research. Its basically never covered in tech reports other than the engineering efforts to solve sequence parallelism. Once again @allen_ai doing the lords work telling us the juicy details of what works and what doesn't

Cody Blakeney

@code_star

14 sa

elie

@eliebakouch

14 sa

this is so good, taking my time to read this tech report like a good movie

Cody Blakeney gönderiyi yeniden yayınladı

elie

@eliebakouch

14 sa

this is so good, taking my time to read this tech report like a good movie

elie

@eliebakouch

22 sa

happy olmo day for those who celebrate!!!

Cody Blakeney

@code_star

14 sa

Be sure to subscribe so you don’t miss it. I’m hoping to actually get some quotes from dataset creators as well. It should be a lot of fun. open.substack.com/pub/cod3star

Cody Blakeney

@code_star

20 Kas

I'm thinking about doing a fun history of LLM datasets series on my substack with my partner in crime @_BrettLarsen . Would anyone be interested in that? Part reading list, part oral history, and part recounting the bad old days when we counted tokens up hills both ways.

Cody Blakeney

@code_star

15 sa

People that I know have trained big models (maybe bigger) have liked this tweet and I in the replies I have people telling me its impractical and can't be done. smh.

Cody Blakeney

@code_star

20 Kas

Cody Blakeney gönderiyi yeniden yayınladı

Kyle Lo

@kylelostat

16 sa

honestly getting carried by the impressive students @hamishivi @scottgeng00 @VictoriaWGraf @heinemandavidj @abertsch72 @MayeeChen @saumyamalik44 @mnoukhov @jacobcares and others 🙏🏻

Cody Blakeney gönderiyi yeniden yayınladı

Cody Blakeney

@code_star

16 sa

yeah, don't forget all the other goats. Its a goat farm!

Cody Blakeney

@code_star

16 sa

Cody Blakeney

@code_star

19 sa

I think we should have 100T parameter MoEs

Cody Blakeney gönderiyi yeniden yayınladı

Xeophon

@xeophon_

23 sa

TECH REPORTS WITH INFORMATION AND STUFF 104 PAGES WE ARE SO, SO, SO BACK!!!!

Ai2

@allen_ai

23 sa

✨ Try Olmo 3 in the Ai2 Playground → playground.allenai.org/?utm_source=x&… & our Discord → discord.gg/ai2 💻 Download: huggingface.co/collections/al… 📝 Blog: allenai.org/blog/olmo3?utm… 📚 Technical report: allenai.org/papers/olmo3?u…

allen_ai's tweet card. Our new flagship Olmo 3 model family empowers the open source community with not only state-of-the-art open models, but the entire model flow and full traceability back to training data.

Olmo 3: Charting a path through the model flow to lead open-source AI | Ai2

Kaynak: allenai.org

Cody Blakeney

@code_star

16 sa

Omg I just realized @pjreddie joined AI2 and now they are doing unhinged off axis plots. Total yolo victory.

Luca Soldaini 🎀

@soldni

23 sa

This release has SO MUCH • New pretrain corpus, new midtrain data, 380B+ long context tokens • 7B & 32B, Base, Instruct, Think, RL Zero • Close to Qwen 3 performance, but fully open!!

Cody Blakeney gönderiyi yeniden yayınladı

Will Held

@WilliamBarrHeld

21 sa

Releases like this are, to me, more exciting than (very impressive) new SoTA models... Because when the OLMo team 🔥COOKS🔥 like this, we all get to read about it and learn from them!

Luca Soldaini 🎀

@soldni

23 sa

This release has SO MUCH • New pretrain corpus, new midtrain data, 380B+ long context tokens • 7B & 32B, Base, Instruct, Think, RL Zero • Close to Qwen 3 performance, but fully open!!

Sebastian Raschka

@rasbt

AK

@_akhaliq

Jonathan Frankle

@jefrankle

Delip Rao e/σ

@deliprao

Databricks Mosaic Research

@DbrxMosaicAI

Aran Komatsuzaki

@arankomatsuzaki

Naveen Rao

@NaveenGRao

Soumith Chintala

@soumithchintala

Rosanne Liu

@savvyRL

Jeremy Howard

@jeremyphoward

elvis

@omarsar0

roon

@tszzl

$sarahookr's profile picture. Adaptive Intelligence. Built @Cohere_Labs, @GoogleBrain, @GoogleDeepmind. ML Efficiency, Multimodal\lingual. Changing spaces where breakthroughs happen.$