saurabh_shah2's profile picture. training olmos @allen_ai prev @Apple @Penn 🎤dabbler of things🎸 🐈‍⬛enjoyer of cats 🐈 and mountains🏔️he/him

Saurabh Shah

@saurabh_shah2

training olmos @allen_ai prev @Apple @Penn 🎤dabbler of things🎸 🐈‍⬛enjoyer of cats 🐈 and mountains🏔️he/him

Saurabh Shah รีโพสต์แล้ว

yo has anyone heard of this Olmo model, loss looks good


why does sonnet 4.5 make like different readme's for every script it writes lmao this sucks

saurabh_shah2's tweet image. why does sonnet 4.5 make like different readme's for every script it writes lmao this sucks

Ohhhh it's called periodic cuz like the periodic table cuz they're doing chemistry and stuff. That's cool

today you will be presented 2 visions of humanity's future with AI if you don't want to build the infinite AI tiktok slop machine but want to develop AI that accelerates fundamental science, raising civilization to Kardashev 1 and beyond come join us at @periodiclabs



wayyy cooler than Sora lmao. If you have the privilege to be picky, you should work on things like this, not the infinite slop machine. Scale simulation, scale learning from experience, and solve our hardest problems by training systems that can think in unhuman-like ways

Today, @ekindogus and I are excited to introduce @periodiclabs. Our goal is to create an AI scientist. Science works by conjecturing how the world might be, running experiments, and learning from the results. Intelligence is necessary, but not sufficient. New knowledge is…

LiamFedus's tweet image. Today, @ekindogus and I are excited to introduce @periodiclabs.

Our goal is to create an AI scientist.

Science works by conjecturing how the world might be, running experiments, and learning from the results.

Intelligence is necessary, but not sufficient. New knowledge is…


It's simple I think: Sonnet 4.5 for most stuff spanning easy to pretty challenging tasks. Especially good for quick scripts. GPT-5-codex high for the most challenging issues that I don't mind waiting a while for. Very surgical! No other model rly matters for coding rn IMO


First Bob now Kevin 🙄

SakanaAI presents Robust Agentic CUDA Kernel Optimization • Fuses ops, boosts forward/backward passes, outperforms torch baselines • Agentic LLM pipeline: PyTorch → CUDA → evolutionary runtime optimization • Soft-verification: LLMs flag incorrect kernels (↑30% verification…

arankomatsuzaki's tweet image. SakanaAI presents Robust Agentic CUDA Kernel Optimization

• Fuses ops, boosts forward/backward passes, outperforms torch baselines
• Agentic LLM pipeline: PyTorch → CUDA → evolutionary runtime optimization
• Soft-verification: LLMs flag incorrect kernels (↑30% verification…


Figured it out nw

saurabh_shah2's tweet image. Figured it out nw

anyone know anyone at @EvoscaleAI in nyc? Would love to chat this week!



If you visit me in Seattle like @aryaman2020 and @michaelryan207 I will show you what life’s all about Or at least take you to Ai2 office where the snacks are pretty good

saurabh_shah2's tweet image. If you visit me in Seattle like @aryaman2020 and @michaelryan207 I will show you what life’s all about

Or at least take you to Ai2 office where the snacks are pretty good
saurabh_shah2's tweet image. If you visit me in Seattle like @aryaman2020 and @michaelryan207 I will show you what life’s all about

Or at least take you to Ai2 office where the snacks are pretty good
saurabh_shah2's tweet image. If you visit me in Seattle like @aryaman2020 and @michaelryan207 I will show you what life’s all about

Or at least take you to Ai2 office where the snacks are pretty good
saurabh_shah2's tweet image. If you visit me in Seattle like @aryaman2020 and @michaelryan207 I will show you what life’s all about

Or at least take you to Ai2 office where the snacks are pretty good

anyone know anyone at @EvoscaleAI in nyc? Would love to chat this week!


I’ll be in nyc next week! Would love to grab a coffee or a drink with folks. I’m interested in language models, especially how they relate to: — reinforcement learning — code gen and agents — accelerating science, especially biology and protein design


Happy to say Ai2 is now on the frontier of blueberry size. Deepseek moment for big blueberries

saurabh_shah2's tweet image. Happy to say Ai2 is now on the frontier of blueberry size. 

Deepseek moment for big blueberries

Holy shit they’re doing on-policy RL by just deploying the model to prod lmao that’s so baller. also 2 hrs for a training step makes our 10 minute steps feel lightning fast @hamishivi … they probably have a bigger batch size though 😅

saurabh_shah2's tweet image. Holy shit they’re doing on-policy RL by just deploying the model to prod lmao that’s so baller. 

also 2 hrs for a training step makes our 10 minute steps feel lightning fast @hamishivi 
… they probably have a bigger batch size though 😅

We've trained a new Tab model that is now the default in Cursor. This model makes 21% fewer suggestions than the previous model while having a 28% higher accept rate for the suggestions it makes. Learn more about how we improved Tab with online RL.



one of those days

saurabh_shah2's tweet image. one of those days

RL env is a thing that runs LM generation and produces a score (we call this the reward) eval is a thing that runs LM generation and produces a score good abstraction by @willccbb to unify these in verifiers

wait "environments" are just evals? did i misread something...? i thought there would be various app mockups, website clones, games, etc. to help simulate things that folks are looking to automate. (unless this is some meta point about evals == envs?)



go birds

like being part of the 2024 Eagles - total championship mentality. Excited to be investing more in @cognition and joining the team with @ScottWu46 and @russelljkaplan - Amazing to see the power law at work



Me when I’ve written a singleton class called PlasticBottle and I’ve already created an instance of it

saurabh_shah2's tweet image. Me when I’ve written a singleton class called PlasticBottle and I’ve already created an instance of it

Yeah. A “bitter lesson” I’ve been coming around to is products used by 1 billion people are not just helpful, but maybe necessary to push the frontier of what models can do…

The bitter lesson here is you can't experiment with continual learning, unless you have continual interactions, and A/B testing at scale



If you’re interp-pilled you should also be olmo-pilled FYI

we will never have direct insights into this or what kind of datamix limitations exist that caused it, of course. >"interpretability for me, but not for thee"



GitHub losing to both hugging face and cursor needs to be studied Why does GitHub lfs suck so much! Mostly genuine question, what is hard about this (or: why is hf/xet impressive?)


rubric based rewards coming soon to an olmo near you 🫡

for the first time i am aware of, there is an entirely private subfield of AI research every company that actually trains models is doing RL with rubrics and LLM-judged rewards but academic work is stuck on RL with automated rewards (math problems and code). much cleaner for…



Loading...

Something went wrong.


Something went wrong.