你可能會喜歡
what there is no overwhelming agreement on is what will happen the day after. how do we build a livable future here, for both israelis and palestinians. israel leadership is not thinking of this now, future looks grim. THIS is where protest effort and outrage should be invested.
remind me why we call the LLM reasoning samples during training "trajectories" and not "sampled responses"?
wait so the GRPO everyone are drooling about is just REINFORCE with the baseline computed as an average over a large sample (and the usual kl regularization in llm models)?
this classic figure is wrong.
I complain a lot about RL lately, and here we go again. The CS view of RL is wrong in how it thinks about rewards, already at the setup level. Briefly, the reward computation should be part of the agent, not part of the environment. More at length here: gist.github.com/yoavg/3eb3e722…
I complain a lot about RL lately, and here we go again. The CS view of RL is wrong in how it thinks about rewards, already at the setup level. Briefly, the reward computation should be part of the agent, not part of the environment. More at length here: gist.github.com/yoavg/3eb3e722…
candy attracts mutant ants with variable number of legs
this is a survey. when you think of a "model" as in "model based RL", what do you have in mind? (in other words, what is a model in this sense?)
turns out some disciplines/people use "a bellman equation" to mean *any* recursive equation which is amenable to DP. in that sense clearly the concept is important. i was talking specifically about the update rule for computing a value function using tabulation.
why are the bellman equations considered foundational or important today? arent they just a straightforward application of DP to solve a problem that only arises in extremely simplified cases that never occur in practice?
it is not "a confession" stop calling things with the most misleading names you can
whats the least french sandwich you could think off
Say what you want about the French, they get sandwiches right... 👇👇👇💯
i am feeling a bit sick and cannot concentrate, perfect time for some science-adjacent online fighting
hot take on the bun purchase: turns out coding agents cannot replace engineers just yet, huh.
actually this should make companies more reluctant to rely on bun, not less.
People frequently ask: > How is Bun sustainable? If I bet my company’s tech stack on Bun, will Bun still be around in a few years? We didn’t have a great answer to this question, until today
United States 趨勢
- 1. #UFC323 129K posts
- 2. Merab 47.6K posts
- 3. Indiana 107K posts
- 4. Good Sunday 51.9K posts
- 5. SB19 ACONic PERFORMANCE 107K posts
- 6. Roach 29.4K posts
- 7. Petr Yan 28.6K posts
- 8. Ohio State 64.9K posts
- 9. Duke 61.7K posts
- 10. Pantoja 36K posts
- 11. Mendoza 42.6K posts
- 12. Benin 38.8K posts
- 13. Walt 8,505 posts
- 14. TOP CALL 8,927 posts
- 15. Vtuber 88.7K posts
- 16. Pitbull 18.6K posts
- 17. Joshua Van 11.5K posts
- 18. Heisman 19.7K posts
- 19. Pearl Harbor 6,358 posts
- 20. Curt Cignetti 12.1K posts
你可能會喜歡
-
Sasha Rush
@srush_nlp -
Soumith Chintala
@soumithchintala -
Jürgen Schmidhuber
@SchmidhuberAI -
Sam Bowman
@sleepinyourhat -
Percy Liang
@percyliang -
Aran Komatsuzaki
@arankomatsuzaki -
Christopher Manning
@chrmanning -
Chris Olah
@ch402 -
hardmaru
@hardmaru -
Sebastian Ruder
@seb_ruder -
Jacob Andreas
@jacobandreas -
Kyunghyun Cho
@kchonyc -
Thomas Wolf
@Thom_Wolf -
Tim Rocktäschel
@_rockt -
Eric Jang
@ericjang11
Something went wrong.
Something went wrong.