Saurabh Shah

@saurabh_shah2

training olmos @allen_ai prev @Apple @Penn 🎤dabbler of things🎸 🐈‍⬛enjoyer of cats 🐈 and mountains🏔️he/him

Seattle, WA

saurabhs.site

เข้าร่วมเมื่อ ธันวาคม 2022

2พันโพสต์ 3พันผู้ติดตาม 2พันกําลังติดตาม

คุณอาจชื่นชอบ

@BerkshireAsia

@remorax98

@LiamDugan_

@turingmusician

@veronica3207

@neuromlet

@defikin

@jackjingyuzhang

@KillerShoaib__

@Chengshuai_Yang

@catherinehavasi

@hodthoughts

@jzhou_jz

@yuchen_niu22

@zorif_

Saurabh Shah

@saurabh_shah2

22 พ.ย.

Ripped a good one post Olmo release

Saurabh Shah

@saurabh_shah2

22 พ.ย.

The thought of Tyler having to ask for an interview is hilarious to me. Instead top labs should be asking Tyler if it’s ok to interview him. (PLEASE DONT POACH HIM TY)

Olmo 3 afterglow - want to share how I came to join Ai2 to encourage others interested in LLM research. I found it difficult to break into the field as someone who does not hold a phd. Research roles at top labs are highly competitive and I didn’t have professional experience…

tyleraromero's tweet image. Olmo 3 afterglow - want to share how I came to join Ai2 to encourage others interested in LLM research. I found it difficult to break into the field as someone who does not hold a phd. Research roles at top labs are highly competitive and I didn’t have professional experience…

Saurabh Shah รีโพสต์แล้ว

Nathan Lambert

@natolambert

21 พ.ย.

Going to be calling my professor friends to make sure they admit Saumya. They're going to be thanking me and asking for advice on how to convince her to go to their school.

Saumya Malik

@saumyamalik44

21 พ.ย.

Olmo 3 is out!!!! It was so much fun working on post-training. Loved seeing this come together with the best team!!!!

Saurabh Shah

@saurabh_shah2

21 พ.ย.

sure I can. On slack I just have to @ @hamishivi and @finbarrtimbers and explain in natural language the problem and then the fire gets put out

Christina Farhat

@farhatchristina

21 พ.ย.

someone asked fei fei about intelligence llms vs. world models yesterday and she was like can you put out a fire with natural language i -

Saurabh Shah

@saurabh_shah2

21 พ.ย.

alright, @PrimeIntellect @arcee_ai @datologyai your turn now. we're waiting. Let it rip 🫡

Saurabh Shah

@saurabh_shah2

21 พ.ย.

yes pls! full transparency (that's kinda what we do): there was a lot to organize for the Olmo 3 release. We were working down to the wire, and we're far from perfect If you have any q's pls reach out (twitter is good, email is great) and we'll be on it team is pretty…

Kyle Lo

@kylelostat

21 พ.ย.

yay thanks so much 🥰 feedback on stuff we missed or asks for more info very much welcome!

Saurabh Shah รีโพสต์แล้ว

Hamish Ivison

@hamishivi

21 พ.ย.

release and keep going done this twice this week ;)

Saurabh Shah

@saurabh_shah2

21 พ.ย.

Me to @hamishivi when he asked me if I tested multi-mode checkpointing

theseriousadult

@gallabytes

23 ต.ค.

Just Claude Things™️

Saurabh Shah

@saurabh_shah2

20 พ.ย.

Which previous setups Michael. Which ones 🤔

Michael Noukhovitch ....🏄 NeurIPS 2025

@mnoukhov

20 พ.ย.

Because Olmo 3 is fully open, we decontaminate our evals from our pretraining and midtraining data. @StellaLisy proves this with spurious rewards: RL trained on a random reward signal can't improve on the evals, unlike some previous setups

mnoukhov's tweet image. Because Olmo 3 is fully open, we decontaminate our evals from our pretraining and midtraining data. @StellaLisy proves this with spurious rewards: RL trained on a random reward signal can't improve on the evals, unlike some previous setups

Saurabh Shah รีโพสต์แล้ว

Ethan Shen

@ethnlshn

20 พ.ย.

Saurabh Shah

@saurabh_shah2

20 พ.ย.

Yes @ profs this is a not-so-inside scoop 👇 @saumyamalik44 and @heinemandavidj are applying to PhDs this cycle, which is kinda funny bc they’re already operating at senior PhD levels 💪 You’re rly gonna want them in your lab

Jacob Morrison

@jacobcares

20 พ.ย.

and if you happen to be a professor hiring students this cycle, keep an eye out for @saumyamalik44 and @heinemandavidj's applications, I think you'll have fierce competition...

Saurabh Shah

@saurabh_shah2

20 พ.ย.

Everyone seems to be reading the Olmo 3 paper today. Seems pretty cool. Maybe I'll read it at some point. Probably not though

Saurabh Shah

@saurabh_shah2

20 พ.ย.

yes, definitely crashed out exactly 0 times to Kat during release week...😅

Kat

@baneepbanana

20 พ.ย.

Congrats Olmo team!! Good job @saurabh_shah2 for not breaking down even one single time. Everyone else go read his blog post

Saurabh Shah

@saurabh_shah2

20 พ.ย.

Oh yeah. Costa still secretly works at Ai2 btw. Don't tell @LiamFedus

Michael Noukhovitch ....🏄 NeurIPS 2025

@mnoukhov

20 พ.ย.

All this work is with the great peeps at @allen_ai who put in a lot of work including putting up with the weird memes I post in slack, #1 manager @natolambert, @finbarrtimbers @saurabh_shah2 @hamishivi @HannaHajishirzi and even @vwxyzjn who advised behind the scenes

Saurabh Shah

@saurabh_shah2

20 พ.ย.

checkout the RLZero section of the paper and our RLZero artifacts!! Michael did an awesome job leading this, and it was tons of fun working with him to try to RL some ~high entropy~ models

Michael Noukhovitch ....🏄 NeurIPS 2025

@mnoukhov

20 พ.ย.

It's also a great setup for multi-objective RL! @saurabh_shah2 and I created four data domains: math, code, instruction-following, and general chat, so you can study their interaction during RL finetuning

Saurabh Shah

@saurabh_shah2

20 พ.ย.

American Chinese Open Ai 2

Saurabh Shah

@saurabh_shah2

20 พ.ย.

i was sooo friggin giddy everytime @finbarrtimbers, @mnoukhov or @hamishivi impl a new RL optimization. In some settings some of these changes were literally a 4x wall-clock time speedup (shoutout inflight updates a.k.a. pipelineRL ) Ty for your service, FInbarr

finbarr

@finbarrtimbers

20 พ.ย.

The OlmoRL infrastructure was 4x faster than Olmo 2 and made it much cheaper to run experiments. Some of the changes: 1. continuous batching 2. in-flight updates 3. active sampling 4. many many improvements to our multi-threading code

finbarrtimbers's tweet image. The OlmoRL infrastructure was 4x faster than Olmo 2 and made it much cheaper to run experiments. Some of the changes:

1. continuous batching

2. in-flight updates

3. active sampling

4. many many improvements to our multi-threading code

Saurabh Shah รีโพสต์แล้ว

Hanna Hajishirzi

@HannaHajishirzi

20 พ.ย.

No @hamishivi No RL.

Hamish Ivison

@hamishivi

20 พ.ย.

Did a bunch on the RL & eval side for this guy! Worked mostly on think RL data, training, eval. look out the for doki doki literature club links in the technical report - lots of war stories from this release👀