Vous pourriez aimer
what there is no overwhelming agreement on is what will happen the day after. how do we build a livable future here, for both israelis and palestinians. israel leadership is not thinking of this now, future looks grim. THIS is where protest effort and outrage should be invested.
remind me why we call the LLM reasoning samples during training "trajectories" and not "sampled responses"?
wait so the GRPO everyone are drooling about is just REINFORCE with the baseline computed as an average over a large sample (and the usual kl regularization in llm models)?
this classic figure is wrong.
I complain a lot about RL lately, and here we go again. The CS view of RL is wrong in how it thinks about rewards, already at the setup level. Briefly, the reward computation should be part of the agent, not part of the environment. More at length here: gist.github.com/yoavg/3eb3e722…
I complain a lot about RL lately, and here we go again. The CS view of RL is wrong in how it thinks about rewards, already at the setup level. Briefly, the reward computation should be part of the agent, not part of the environment. More at length here: gist.github.com/yoavg/3eb3e722…
candy attracts mutant ants with variable number of legs
this is a survey. when you think of a "model" as in "model based RL", what do you have in mind? (in other words, what is a model in this sense?)
turns out some disciplines/people use "a bellman equation" to mean *any* recursive equation which is amenable to DP. in that sense clearly the concept is important. i was talking specifically about the update rule for computing a value function using tabulation.
why are the bellman equations considered foundational or important today? arent they just a straightforward application of DP to solve a problem that only arises in extremely simplified cases that never occur in practice?
it is not "a confession" stop calling things with the most misleading names you can
whats the least french sandwich you could think off
Say what you want about the French, they get sandwiches right... 👇👇👇💯
i am feeling a bit sick and cannot concentrate, perfect time for some science-adjacent online fighting
hot take on the bun purchase: turns out coding agents cannot replace engineers just yet, huh.
actually this should make companies more reluctant to rely on bun, not less.
People frequently ask: > How is Bun sustainable? If I bet my company’s tech stack on Bun, will Bun still be around in a few years? We didn’t have a great answer to this question, until today
United States Tendances
- 1. Bama 70.5K posts
- 2. Mendoza 17.2K posts
- 3. #UFC323 42.3K posts
- 4. Indiana 55.1K posts
- 5. #NXTDeadline 33.9K posts
- 6. Ohio State 28.8K posts
- 7. Sayin 89.2K posts
- 8. Georgia 75.2K posts
- 9. Miami 263K posts
- 10. Gus Johnson 1,212 posts
- 11. Pat Spencer 6,711 posts
- 12. Heisman 8,672 posts
- 13. #AEWCollision 10.4K posts
- 14. #Big10Championship 1,069 posts
- 15. Jeremiah Smith 3,428 posts
- 16. #iufb 4,235 posts
- 17. Cavs 8,088 posts
- 18. Cass 7,728 posts
- 19. Caden Curry 1,750 posts
- 20. Buckeyes 7,494 posts
Vous pourriez aimer
-
Sasha Rush
@srush_nlp -
Soumith Chintala
@soumithchintala -
Jürgen Schmidhuber
@SchmidhuberAI -
Sam Bowman
@sleepinyourhat -
Percy Liang
@percyliang -
Aran Komatsuzaki
@arankomatsuzaki -
Christopher Manning
@chrmanning -
Chris Olah
@ch402 -
hardmaru
@hardmaru -
Sebastian Ruder
@seb_ruder -
Jacob Andreas
@jacobandreas -
Kyunghyun Cho
@kchonyc -
Thomas Wolf
@Thom_Wolf -
Tim Rocktäschel
@_rockt -
Eric Jang
@ericjang11
Something went wrong.
Something went wrong.