guidotrev's profile picture. rl enthusiast | math @ uchicago | @zfellows

Gustav

@guidotrev

rl enthusiast | math @ uchicago | @zfellows

Gustav repostou

We are happy to announce SkyRL tx 0.0.3! SkyRL tx is an open source library that implements a backend for the Tinker API and allows people to set up their own Tinker-like service running on their own hardware. This release has full MoE support, better checkpointing and the first…

novasky-ai.notion.site

SkyRL tx v0.0.3 Release

Philipp Moritz, Tyler Griggs, and the SkyRL Team


i don't understand the asynchronous rl claim for higher throughput. you can colocate training and generation on the same set of gpus and the switching bottleneck is minimal. this still achieves high throughput while avoiding off policy training.

practical, modern GRPO tweaks as described in Meta's Code World Models paper

iScienceLuvr's tweet image. practical, modern GRPO tweaks as described in Meta's Code World Models paper


thank god kl is useless 🙏 fucking hate having to deal with the ref model


Gustav repostou

I am making agents that fix performance bottlenecks in code. Here, it made search in @WerWolv ImHex 2x faster! End-to-end, producing a ready-to-compile updated project with no changes in functionality


claude code is an outrageous reward hacker


it's all about logprobs


ml bugs are the worst


yo chat?

tp=1 pp=2 %60
tp=2 pp=1 %40

5 voto · Resultados finais


great read

guidotrev's tweet image. great read

Meet SFR-DeepResearch (SFR-DR) 🤖: our RL-trained autonomous agents that can reason, search, and code their way through deep research tasks. 🚀SFR-DR-20B achieves 28.7% on Humanity's Last Exam (text-only) using only web search 🔍, browsing 🌐, and Python interpreter 🐍,…

CaimingXiong's tweet image. Meet SFR-DeepResearch (SFR-DR) 🤖: our RL-trained autonomous agents that can reason, search, and code their way through deep research tasks.

🚀SFR-DR-20B achieves 28.7% on Humanity's Last Exam (text-only) using only web search 🔍, browsing 🌐, and Python interpreter 🐍,…


cool graph from @thinkymachines blog that shows performance peaks when batches are size 2^n

guidotrev's tweet image. cool graph from @thinkymachines blog that shows performance peaks when batches are size 2^n

non verbose claude code will be agi imo


day by day. build. show. be patience. build.


Air mattress reactivated 🦧


rl frameworks fail or succeed based on how photons hit silicon on your laptop. the same script that worked 12 days ago with pinged dependencies and versions now fails

guidotrev's tweet image. rl frameworks fail or succeed based on how photons hit silicon on your laptop. the same script that worked 12 days ago with pinged dependencies and versions now fails

Sometimes I open my stripe and realize my saas autopilot side hustle casually made $150 yesterday 🦍

guidotrev's tweet image. Sometimes I open my stripe and realize my saas autopilot side hustle casually made $150 yesterday 🦍

i saw another tweet about someone beating a benchmark with rl after training directly on it and open sourcing the project. what's literally the point? are startups just showing off or have we forgotten basic train/val/test splits ever since rl went mainstream?


United States Tendências

Loading...

Something went wrong.


Something went wrong.