Gustav

@guidotrev

rl enthusiast | math @ uchicago | @zfellows

CA, San Francisco

venturepoint.vc

Entrou em Abril de 2024

475Posts 719Seguidores 756Seguindo

Gustav repostou

Philipp Moritz

@pcmoritz

21 de out. de

We are happy to announce SkyRL tx 0.0.3! SkyRL tx is an open source library that implements a backend for the Tinker API and allows people to set up their own Tinker-like service running on their own hardware. This release has full MoE support, better checkpointing and the first…

novasky-ai.notion.site

SkyRL tx v0.0.3 Release

Philipp Moritz, Tyler Griggs, and the SkyRL Team

Fonte: novasky-ai.notion.site

Gustav

@guidotrev

1 de out. de

i don't understand the asynchronous rl claim for higher throughput. you can colocate training and generation on the same set of gpus and the switching bottleneck is minimal. this still achieves high throughput while avoiding off policy training.

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

28 de set. de

practical, modern GRPO tweaks as described in Meta's Code World Models paper

Gustav

@guidotrev

29 de set. de

thank god kl is useless 🙏 fucking hate having to deal with the ref model

Gustav repostou

Makar Kuznietsov

@MakarKuznietsov

27 de set. de

I am making agents that fix performance bottlenecks in code. Here, it made search in @WerWolv ImHex 2x faster! End-to-end, producing a ready-to-compile updated project with no changes in functionality

Gustav

@guidotrev

25 de set. de

claude code is an outrageous reward hacker

Gustav

@guidotrev

25 de set. de

it's all about logprobs

Gustav

@guidotrev

15 de set. de

ml bugs are the worst

Gustav

@guidotrev

14 de set. de

yo chat?

tp=1 pp=2 %60

tp=2 pp=1 %40

5 voto · Resultados finais

Gustav

@guidotrev

13 de set. de

great read

Caiming Xiong

@CaimingXiong

9 de set. de

Meet SFR-DeepResearch (SFR-DR) 🤖: our RL-trained autonomous agents that can reason, search, and code their way through deep research tasks. 🚀SFR-DR-20B achieves 28.7% on Humanity's Last Exam (text-only) using only web search 🔍, browsing 🌐, and Python interpreter 🐍,…

CaimingXiong's tweet image. Meet SFR-DeepResearch (SFR-DR) 🤖: our RL-trained autonomous agents that can reason, search, and code their way through deep research tasks.

🚀SFR-DR-20B achieves 28.7% on Humanity's Last Exam (text-only) using only web search 🔍, browsing 🌐, and Python interpreter 🐍,…

Gustav

@guidotrev

13 de set. de

cool graph from @thinkymachines blog that shows performance peaks when batches are size 2^n

Gustav

@guidotrev

12 de set. de

Gustav

@guidotrev

10 de set. de

non verbose claude code will be agi imo

Gustav

@guidotrev

10 de set. de

day by day. build. show. be patience. build.

Gustav

@guidotrev

8 de set. de

Air mattress reactivated 🦧

Gustav

@guidotrev

4 de set. de

rl frameworks fail or succeed based on how photons hit silicon on your laptop. the same script that worked 12 days ago with pinged dependencies and versions now fails

guidotrev's tweet image. rl frameworks fail or succeed based on how photons hit silicon on your laptop. the same script that worked 12 days ago with pinged dependencies and versions now fails

Gustav

@guidotrev

4 de set. de

Sometimes I open my stripe and realize my saas autopilot side hustle casually made $150 yesterday 🦍

Gustav

@guidotrev

3 de set. de

i saw another tweet about someone beating a benchmark with rl after training directly on it and open sourcing the project. what's literally the point? are startups just showing off or have we forgotten basic train/val/test splits ever since rl went mainstream?