_oleh's profile picture. RL at Cursor

Oleg Rybkin

@_oleh

RL at Cursor

Pinned

Want more scaling laws for value-based RL? Preston and I analyzed scaling model size! Larger models predictably improve data efficiency, performance, reduce overfitting, and allow using larger batch size. After this, I am now more optimistic than ever abt TD-learning.

If we have tons of compute to spend to train value functions, how can we be sure we're spending it optimally? In our new paper, we analyze the interplay of model size, UTD, and batch size for training value functions achieving optimal performance. arxiv.org/abs/2508.14881



Oleg Rybkin reposted

one of the best moments at BAIR lab actually never imagined to spot prof @redstone_hong on a random day

arthurallshire's tweet image. one of the best moments at BAIR lab actually never imagined to spot prof @redstone_hong on a random day

one of the best moments at BAIR lab actually never imagined to spot prof sergey levine from physical intelligence on a random day

himanshustwts's tweet image. one of the best moments at BAIR lab actually never imagined to spot prof sergey levine from physical intelligence on a random day


Oleg Rybkin reposted

It has been a joy working on composer with the team and watching all the pieces come together over the past few months I hope people find the model useful

Composer is a new model we built at Cursor. We used RL to train a big MoE model to be really good at real-world coding, and also very fast. cursor.com/blog/composer Excited for the potential of building specialized models to help in critical domains.

srush_nlp's tweet image. Composer is a new model we built at Cursor. We used RL to train a big MoE model to be really good at real-world coding, and also very fast. 

cursor.com/blog/composer

Excited for the potential of building specialized models to help in critical domains.


Oleg Rybkin reposted

Some exciting new to share - I joined Cursor! We just shipped a model 🐆 It's really good - try it out! cursor.com/blog/composer I left OpenAI after 3 years there and moved to Cursor a few weeks ago. After working on RL for my whole career, it was incredible to see RL come alive…

Introducing Cursor 2.0. Our first coding model and the best way to code with agents.



Oleg Rybkin reposted

some folks and i are making something new if you're hopeful about AI empowering everyone if you've worked on multiturn, memory, model behavior, multiagent RL, user sim, AI interfaces/products, kernels, or dist systems if you want frontier-scale compute & top infra let's chat!

ericzelikman's tweet image. some folks and i are making something new

if you're hopeful about AI empowering everyone
if you've worked on multiturn, memory, model behavior, multiagent RL, user sim, AI interfaces/products, kernels, or dist systems
if you want frontier-scale compute & top infra

let's chat!

Oleg Rybkin reposted

Introducing Cursor 2.0. Our first coding model and the best way to code with agents.


Seems like a good day to announce that I have decided to join Cursor! Exited about training RL agents on the correct task distribution :)


Oleg Rybkin reposted

With the right design decisions, value-based RL admits predictable scaling. value-scaling.github.io We wrote a blog post on our two papers challenging conventional wisdom that off-policy RL methods are fundamentally unpredictable.

preston_fu's tweet image. With the right design decisions, value-based RL admits predictable scaling.

value-scaling.github.io

We wrote a blog post on our two papers challenging conventional wisdom that off-policy RL methods are fundamentally unpredictable.

Oleg Rybkin reposted

@preston_fu @_oleh and I wrote a blog post on scaling laws and value function based RL, summarizing our two papers in this direction and discussing open questions! value-scaling.github.io Check it out! Feedback & comments are very welcome!

aviral_kumar2's tweet image. @preston_fu @_oleh and I wrote a blog post on scaling laws and value function based RL, summarizing our two papers in this direction and discussing open questions!

value-scaling.github.io

Check it out! Feedback & comments are very welcome!

Oleg Rybkin reposted

We have been doing work on scaling laws for off-policy RL for some time now and we just put a new paper out: arxiv.org/abs/2508.14881 Here, @preston_fu @_oleh lead a study on how to best allocate compute for training value functions in deep RL: 🧵⬇️


Oleg Rybkin reposted

Following up on our work on scaling laws for value-based RL (led by @_oleh and @preston_fu), we've been trying to figure out compute optimal parameters for value-based RL training. Check out Preston's post about our findings!

If we have tons of compute to spend to train value functions, how can we be sure we're spending it optimally? In our new paper, we analyze the interplay of model size, UTD, and batch size for training value functions achieving optimal performance. arxiv.org/abs/2508.14881



Oleg Rybkin reposted

How can we best scale up value based RL? We need to use bigger models, which mitigate what we call “TD-overfitting” (more below!👇 🧵 ). Further, we need to scale batch size and UTD accordingly as the models get bigger. Great work led by @preston_fu and @_oleh

If we have tons of compute to spend to train value functions, how can we be sure we're spending it optimally? In our new paper, we analyze the interplay of model size, UTD, and batch size for training value functions achieving optimal performance. arxiv.org/abs/2508.14881



📈📈📈

If we have tons of compute to spend to train value functions, how can we be sure we're spending it optimally? In our new paper, we analyze the interplay of model size, UTD, and batch size for training value functions achieving optimal performance. arxiv.org/abs/2508.14881



Loading...

Something went wrong.


Something went wrong.