Oleg Rybkin

@_oleh

RL at Cursor

Philadelphia

olehrybkin.com

Joined January 2014

331Posts 1KFollowers 445Following

You might like

@abhishekunique7

@ofirnachum

@ben_eysenbach

@agarwl_

@Vikashplus

@KarlPertsch

@stepjamUK

@IMordatch

@shahdhruv_

@KostasPenn

@aravindr93

@mihdalal

@LerrelPinto

@Jesse_Y_Zhang

@YoungwoonLee

Pinned

Oleg Rybkin

@_oleh

Sep 2

Want more scaling laws for value-based RL? Preston and I analyzed scaling model size! Larger models predictably improve data efficiency, performance, reduce overfitting, and allow using larger batch size. After this, I am now more optimistic than ever abt TD-learning.

Preston Fu

@preston_fu

Sep 2

If we have tons of compute to spend to train value functions, how can we be sure we're spending it optimally? In our new paper, we analyze the interplay of model size, UTD, and batch size for training value functions achieving optimal performance. arxiv.org/abs/2508.14881

Oleg Rybkin reposted

Arthur Allshire

@arthurallshire

Nov 6

one of the best moments at BAIR lab actually never imagined to spot prof @redstone_hong on a random day

himanshu

@himanshustwts

Nov 6

one of the best moments at BAIR lab actually never imagined to spot prof sergey levine from physical intelligence on a random day

himanshustwts's tweet image. one of the best moments at BAIR lab actually never imagined to spot prof sergey levine from physical intelligence on a random day

Oleg Rybkin reposted

Charlie Snell

@sea_snell

Oct 29

It has been a joy working on composer with the team and watching all the pieces come together over the past few months I hope people find the model useful

Sasha Rush

@srush_nlp

Oct 29

Composer is a new model we built at Cursor. We used RL to train a big MoE model to be really good at real-world coding, and also very fast. cursor.com/blog/composer Excited for the potential of building specialized models to help in critical domains.

srush_nlp's tweet image. Composer is a new model we built at Cursor. We used RL to train a big MoE model to be really good at real-world coding, and also very fast.

cursor.com/blog/composer

Excited for the potential of building specialized models to help in critical domains.

Oleg Rybkin reposted

Ashvin Nair

@ashvinair

Oct 29

Some exciting new to share - I joined Cursor! We just shipped a model 🐆 It's really good - try it out! cursor.com/blog/composer I left OpenAI after 3 years there and moved to Cursor a few weeks ago. After working on RL for my whole career, it was incredible to see RL come alive…

ashvinair's tweet card. Built to make you extraordinarily productive, Cursor is the best way to code with AI.

Composer: Building a fast frontier model with RL · Cursor

Source: cursor.com

Cursor

@cursor_ai

Oct 29

Introducing Cursor 2.0. Our first coding model and the best way to code with agents.

Oleg Rybkin reposted

Eric Zelikman

@ericzelikman

Sep 23

some folks and i are making something new if you're hopeful about AI empowering everyone if you've worked on multiturn, memory, model behavior, multiagent RL, user sim, AI interfaces/products, kernels, or dist systems if you want frontier-scale compute & top infra let's chat!

ericzelikman's tweet image. some folks and i are making something new

if you're hopeful about AI empowering everyone
if you've worked on multiturn, memory, model behavior, multiagent RL, user sim, AI interfaces/products, kernels, or dist systems
if you want frontier-scale compute &amp; top infra

let's chat!

Oleg Rybkin reposted

Cursor

@cursor_ai

Oct 29

Introducing Cursor 2.0. Our first coding model and the best way to code with agents.

Oleg Rybkin

@_oleh

Oct 29

Seems like a good day to announce that I have decided to join Cursor! Exited about training RL agents on the correct task distribution :)

Oleg Rybkin reposted

Preston Fu

@preston_fu

Sep 8

With the right design decisions, value-based RL admits predictable scaling. value-scaling.github.io We wrote a blog post on our two papers challenging conventional wisdom that off-policy RL methods are fundamentally unpredictable.

preston_fu's tweet image. With the right design decisions, value-based RL admits predictable scaling.

value-scaling.github.io

We wrote a blog post on our two papers challenging conventional wisdom that off-policy RL methods are fundamentally unpredictable.

Oleg Rybkin reposted

Aviral Kumar

@aviral_kumar2

Sep 8

@preston_fu @_oleh and I wrote a blog post on scaling laws and value function based RL, summarizing our two papers in this direction and discussing open questions! value-scaling.github.io Check it out! Feedback & comments are very welcome!

aviral_kumar2's tweet image. @preston_fu @_oleh and I wrote a blog post on scaling laws and value function based RL, summarizing our two papers in this direction and discussing open questions!

value-scaling.github.io

Check it out! Feedback &amp; comments are very welcome!

Oleg Rybkin reposted

Aviral Kumar

@aviral_kumar2

Sep 3

We have been doing work on scaling laws for off-policy RL for some time now and we just put a new paper out: arxiv.org/abs/2508.14881 Here, @preston_fu @_oleh lead a study on how to best allocate compute for training value functions in deep RL: 🧵⬇️

Oleg Rybkin reposted

Sergey Levine

@svlevine

Sep 3

Following up on our work on scaling laws for value-based RL (led by @_oleh and @preston_fu), we've been trying to figure out compute optimal parameters for value-based RL training. Check out Preston's post about our findings!

Preston Fu

@preston_fu

Sep 2

Oleg Rybkin reposted

Paul Zhou

@zhiyuan_zhou_

Sep 3

How can we best scale up value based RL? We need to use bigger models, which mitigate what we call “TD-overfitting” (more below!👇 🧵 ). Further, we need to scale batch size and UTD accordingly as the models get bigger. Great work led by @preston_fu and @_oleh