Sasha Rush
@srush_nlp
Researcher at Cursor https://www.youtube.com/@srush_nlp
Bạn có thể thích
Got addicted to @srush_nlp 's Tensor Puzzles, so I wrote a sequel with more puzzles: github.com/hardik-vala/Te…. Example:
SO lucky to have Alex intern with us through Olmo 3 development & see his massive contributions to our pretrain data 🐟Alex's created WebOrganizer (ICML 2025) which moved us beyond "quality? ✅❌" towards "what type of document?" We use WebOrganizer in Olmo 3 to partition both…
Olmo 3 has some neat pre-training data curation: - @MayeeChen found much better ways to mix WebOrganizer domains - We use quality signals not as a filter (0/1) but for setting # epochs per sample (0-7x), but any duplicates would distort this ➡️ Run global dedup across 39B docs 🤯
🚀 Introducing Apriel-H1: a family of seven 15B hybrid model (Transformer + Mamba) distilled directly from Apriel-Nemotron-15B-Thinker reasoner. ✅ Navigating throughput performance tradeoff with up to 3.4x speedup ✅ 2x speedup without performance loss ✅ Efficient distillation…
Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey. Best fully open 32B reasoning model & best 32B base model. 🧵
I'll be among dozens (hundreds?) of VCs attending NeurIPS this year, but among the few who might be more interested in topics like managing episodic memory with RL, avoiding model collapse when training with synthetic data, and more effectively using base models to guide…
Jacob Andreas (@jacobandreas) on "the specification problem" Can we build interactive systems for task specification? LM as an interviewer about the task Use the interview transcript as the task prompt This outperforms or is competitive to active learning or user-designed…
Checkout the IVADO workshop on Deploying Autonomous Agents: Lessons, Risks and Real-World Impact happening today until Wednesday in Montreal with an exciting line up of speakers #Agents #LLMs ivado.ca/en/events/2nd-…
how we trained composer-1 by @srush_nlp youtube.com/watch?v=md8D8e…
some points from the talk - for the agent RL, the RL rollouts try to mimic how cursor works in production at scale including cursor as environment - try to keep training/inference similar so they use same tool call formats in prod infra architecture - trainer server (pytorch…
Talk at Ray Summit on "Building Cursor Composer." Overview of the work from our research team. youtube.com/watch?v=md8D8e…
youtube.com
YouTube
Ray Summit 2025 Keynote: Building Cursor Composer with Sasha Rush
Interesting to hear this six-month-old podcast where we discuss ideas that later evolved into what's now Online Tab RL and Composer.
A conversation on the optimal reward for coding agents, infinite context models, and real-time RL
This paper is really cool! Big fan of this style of interpretability, nice to see it scaled up a bit.
Excited to share our latest work on untangling language models by training them with extremely sparse weights! We can isolate tiny circuits inside the model responsible for various simple behaviors and understand them unprecedentedly well. openai.com/index/understa…
Excited to share our latest work on untangling language models by training them with extremely sparse weights! We can isolate tiny circuits inside the model responsible for various simple behaviors and understand them unprecedentedly well. openai.com/index/understa…
Honored to receive the Computer Science Canada Outstanding Early Career Researcher award 🏅. It is a recognition of the work carried out by my students for their courage to push fundamental ideas in natural language processing even in the era of LLMs. Thanks to my mentors and…
Congratulations to Siva Reddy (@sivareddyg), Core Academic Member at Mila, who has received the prestigious Outstanding Early Career Computer Science Researcher Award from @CSCan_InfoCan , the leading organization for the computer science community in Canada.…
COLM is going to San Francisco for 2026! 🗓️Dates: October 6-9, 2026 🏨Venue: Hilton San Francisco Union Square Website and CFPs for papers and workshops coming up soon!
i have mostly stopped using coding models other than composer-1 and tab
I think cursor might just have the mandate of heaven now. this composer 1 model is incredible and its been getting better (vibes). I think raw iq is no longer the bottleneck. its just reliability of tool use and harnessing
United States Xu hướng
- 1. Happy Thanksgiving 310K posts
- 2. #StrangerThings5 340K posts
- 3. Afghan 393K posts
- 4. #DareYouToDeath 271K posts
- 5. DYTD TRAILER 198K posts
- 6. Turkey Day 17.5K posts
- 7. BYERS 78.9K posts
- 8. Good Thursday 23.3K posts
- 9. Feliz Día de Acción de Gracias N/A
- 10. robin 117K posts
- 11. Taliban 50.2K posts
- 12. #Thankful 4,272 posts
- 13. Vecna 79.7K posts
- 14. Rahmanullah Lakanwal 151K posts
- 15. Dustin 57.4K posts
- 16. Tini 13.2K posts
- 17. Nancy 74.4K posts
- 18. #Grateful 2,408 posts
- 19. Holly 79K posts
- 20. TOP CALL 11.5K posts
Bạn có thể thích
-
Jürgen Schmidhuber
@SchmidhuberAI -
Soumith Chintala
@soumithchintala -
Christopher Manning
@chrmanning -
Sam Bowman
@sleepinyourhat -
Sebastian Ruder
@seb_ruder -
Thomas Wolf
@Thom_Wolf -
Percy Liang
@percyliang -
Kyunghyun Cho
@kchonyc -
Aran Komatsuzaki
@arankomatsuzaki -
Lilian Weng
@lilianweng -
Eric Jang
@ericjang11 -
Graham Neubig
@gneubig -
AllenNLP
@ai2_allennlp -
Tim Rocktäschel
@_rockt -
Sergey Levine
@svlevine
Something went wrong.
Something went wrong.