srush_nlp's profile picture. Researcher at Cursor
https://www.youtube.com/@srush_nlp

Sasha Rush

@srush_nlp

Researcher at Cursor https://www.youtube.com/@srush_nlp

Sasha Rush reposted

Got addicted to @srush_nlp 's Tensor Puzzles, so I wrote a sequel with more puzzles: github.com/hardik-vala/Te…. Example:

TheHardikVala's tweet image. Got addicted to @srush_nlp 's Tensor Puzzles, so I wrote a sequel with more puzzles: github.com/hardik-vala/Te…. Example:

Sasha Rush reposted

SO lucky to have Alex intern with us through Olmo 3 development & see his massive contributions to our pretrain data 🐟Alex's created WebOrganizer (ICML 2025) which moved us beyond "quality? ✅❌" towards "what type of document?" We use WebOrganizer in Olmo 3 to partition both…

kylelostat's tweet image. SO lucky to have Alex intern with us through Olmo 3 development & see his massive contributions to our pretrain data

🐟Alex's created WebOrganizer (ICML 2025) which moved us beyond "quality? ✅❌" towards "what type of document?" We use WebOrganizer in Olmo 3 to partition both…
kylelostat's tweet image. SO lucky to have Alex intern with us through Olmo 3 development & see his massive contributions to our pretrain data

🐟Alex's created WebOrganizer (ICML 2025) which moved us beyond "quality? ✅❌" towards "what type of document?" We use WebOrganizer in Olmo 3 to partition both…

Olmo 3 has some neat pre-training data curation: - @MayeeChen found much better ways to mix WebOrganizer domains - We use quality signals not as a filter (0/1) but for setting # epochs per sample (0-7x), but any duplicates would distort this ➡️ Run global dedup across 39B docs 🤯



Sasha Rush reposted

🚀 Introducing Apriel-H1: a family of seven 15B hybrid model (Transformer + Mamba) distilled directly from Apriel-Nemotron-15B-Thinker reasoner. ✅ Navigating throughput performance tradeoff with up to 3.4x speedup ✅ 2x speedup without performance loss ✅ Efficient distillation…

tscholak's tweet image. 🚀 Introducing Apriel-H1: a family of seven 15B hybrid model (Transformer + Mamba) distilled directly from Apriel-Nemotron-15B-Thinker reasoner.

✅ Navigating throughput performance tradeoff with up to 3.4x speedup
✅ 2x speedup without performance loss
✅ Efficient distillation…

Sasha Rush reposted

Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey. Best fully open 32B reasoning model & best 32B base model. 🧵

allen_ai's tweet image. Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey.
Best fully open 32B reasoning model & best 32B base model. 🧵

Sasha Rush reposted

I'll be among dozens (hundreds?) of VCs attending NeurIPS this year, but among the few who might be more interested in topics like managing episodic memory with RL, avoiding model collapse when training with synthetic data, and more effectively using base models to guide…


Sasha Rush reposted

Who would be in for Cursor Conf?


Sasha Rush reposted

Jacob Andreas (@jacobandreas) on "the specification problem" Can we build interactive systems for task specification? LM as an interviewer about the task Use the interview transcript as the task prompt This outperforms or is competitive to active learning or user-designed…

sivareddyg's tweet image. Jacob Andreas (@jacobandreas) on "the specification problem"

Can we build interactive systems for task specification?

LM as an interviewer about the task
Use the interview transcript as the task prompt
This outperforms or is competitive to active learning or user-designed…
sivareddyg's tweet image. Jacob Andreas (@jacobandreas) on "the specification problem"

Can we build interactive systems for task specification?

LM as an interviewer about the task
Use the interview transcript as the task prompt
This outperforms or is competitive to active learning or user-designed…
sivareddyg's tweet image. Jacob Andreas (@jacobandreas) on "the specification problem"

Can we build interactive systems for task specification?

LM as an interviewer about the task
Use the interview transcript as the task prompt
This outperforms or is competitive to active learning or user-designed…
sivareddyg's tweet image. Jacob Andreas (@jacobandreas) on "the specification problem"

Can we build interactive systems for task specification?

LM as an interviewer about the task
Use the interview transcript as the task prompt
This outperforms or is competitive to active learning or user-designed…

Checkout the IVADO workshop on Deploying Autonomous Agents: Lessons, Risks and Real-World Impact happening today until Wednesday in Montreal with an exciting line up of speakers #Agents #LLMs ivado.ca/en/events/2nd-…



Sasha Rush reposted

how we trained composer-1 by @srush_nlp youtube.com/watch?v=md8D8e…

ericzakariasson's tweet image. how we trained composer-1 by @srush_nlp
youtube.com/watch?v=md8D8e…

Sasha Rush reposted

some points from the talk - for the agent RL, the RL rollouts try to mimic how cursor works in production at scale including cursor as environment - try to keep training/inference similar so they use same tool call formats in prod infra architecture - trainer server (pytorch…

dejavucoder's tweet image. some points from the talk
- for the agent RL, the RL rollouts try to mimic how cursor works in production at scale including cursor as environment
- try to keep training/inference similar so they use same tool call formats in prod

infra architecture 
- trainer server (pytorch…
dejavucoder's tweet image. some points from the talk
- for the agent RL, the RL rollouts try to mimic how cursor works in production at scale including cursor as environment
- try to keep training/inference similar so they use same tool call formats in prod

infra architecture 
- trainer server (pytorch…
dejavucoder's tweet image. some points from the talk
- for the agent RL, the RL rollouts try to mimic how cursor works in production at scale including cursor as environment
- try to keep training/inference similar so they use same tool call formats in prod

infra architecture 
- trainer server (pytorch…
dejavucoder's tweet image. some points from the talk
- for the agent RL, the RL rollouts try to mimic how cursor works in production at scale including cursor as environment
- try to keep training/inference similar so they use same tool call formats in prod

infra architecture 
- trainer server (pytorch…

Talk at Ray Summit on "Building Cursor Composer." Overview of the work from our research team. youtube.com/watch?v=md8D8e…

srush_nlp's tweet card. Ray Summit 2025 Keynote: Building Cursor Composer with Sasha Rush

youtube.com

YouTube

Ray Summit 2025 Keynote: Building Cursor Composer with Sasha Rush



Sasha Rush reposted

Interesting to hear this six-month-old podcast where we discuss ideas that later evolved into what's now Online Tab RL and Composer.

A conversation on the optimal reward for coding agents, infinite context models, and real-time RL



This paper is really cool! Big fan of this style of interpretability, nice to see it scaled up a bit.

Excited to share our latest work on untangling language models by training them with extremely sparse weights! We can isolate tiny circuits inside the model responsible for various simple behaviors and understand them unprecedentedly well. openai.com/index/understa…



Sasha Rush reposted

Excited to share our latest work on untangling language models by training them with extremely sparse weights! We can isolate tiny circuits inside the model responsible for various simple behaviors and understand them unprecedentedly well. openai.com/index/understa…


Sasha Rush reposted

Honored to receive the Computer Science Canada Outstanding Early Career Researcher award 🏅. It is a recognition of the work carried out by my students for their courage to push fundamental ideas in natural language processing even in the era of LLMs. Thanks to my mentors and…

Congratulations to Siva Reddy (@sivareddyg), Core Academic Member at Mila, who has received the prestigious Outstanding Early Career Computer Science Researcher Award from @CSCan_InfoCan , the leading organization for the computer science community in Canada.…

Mila_Quebec's tweet image. Congratulations to Siva Reddy (@sivareddyg), Core Academic Member at Mila, who has received the prestigious Outstanding Early Career Computer Science Researcher Award from @CSCan_InfoCan , the leading organization for the computer science community in Canada.…


Sasha Rush reposted

COLM is going to San Francisco for 2026! 🗓️Dates: October 6-9, 2026 🏨Venue: Hilton San Francisco Union Square Website and CFPs for papers and workshops coming up soon!

COLM_conf's tweet image. COLM is going to San Francisco for 2026!

🗓️Dates: October 6-9, 2026
🏨Venue: Hilton San Francisco Union Square

Website and CFPs for papers and workshops coming up soon!

Sasha Rush reposted
COLM_conf's tweet image.

Sasha Rush reposted

i have mostly stopped using coding models other than composer-1 and tab

I think cursor might just have the mandate of heaven now. this composer 1 model is incredible and its been getting better (vibes). I think raw iq is no longer the bottleneck. its just reliability of tool use and harnessing



Loading...

Something went wrong.


Something went wrong.