You might like
what there is no overwhelming agreement on is what will happen the day after. how do we build a livable future here, for both israelis and palestinians. israel leadership is not thinking of this now, future looks grim. THIS is where protest effort and outrage should be invested.
remind me why we call the LLM reasoning samples during training "trajectories" and not "sampled responses"?
wait so the GRPO everyone are drooling about is just REINFORCE with the baseline computed as an average over a large sample (and the usual kl regularization in llm models)?
this classic figure is wrong.
I complain a lot about RL lately, and here we go again. The CS view of RL is wrong in how it thinks about rewards, already at the setup level. Briefly, the reward computation should be part of the agent, not part of the environment. More at length here: gist.github.com/yoavg/3eb3e722โฆ
I complain a lot about RL lately, and here we go again. The CS view of RL is wrong in how it thinks about rewards, already at the setup level. Briefly, the reward computation should be part of the agent, not part of the environment. More at length here: gist.github.com/yoavg/3eb3e722โฆ
candy attracts mutant ants with variable number of legs
this is a survey. when you think of a "model" as in "model based RL", what do you have in mind? (in other words, what is a model in this sense?)
turns out some disciplines/people use "a bellman equation" to mean *any* recursive equation which is amenable to DP. in that sense clearly the concept is important. i was talking specifically about the update rule for computing a value function using tabulation.
why are the bellman equations considered foundational or important today? arent they just a straightforward application of DP to solve a problem that only arises in extremely simplified cases that never occur in practice?
it is not "a confession" stop calling things with the most misleading names you can
whats the least french sandwich you could think off
Say what you want about the French, they get sandwiches right... ๐๐๐๐ฏ
i am feeling a bit sick and cannot concentrate, perfect time for some science-adjacent online fighting
hot take on the bun purchase: turns out coding agents cannot replace engineers just yet, huh.
actually this should make companies more reluctant to rely on bun, not less.
People frequently ask: > How is Bun sustainable? If I bet my companyโs tech stack on Bun, will Bun still be around in a few years? We didnโt have a great answer to this question, until today
United States Trends
- 1. Notre Dame 102K posts
- 2. Daniel Jones 9,268 posts
- 3. Colts 18.4K posts
- 4. Tulane 35.1K posts
- 5. Miami 419K posts
- 6. Bengals 20.6K posts
- 7. Achilles 7,708 posts
- 8. Alabama 167K posts
- 9. Jeffy Yu 1,812 posts
- 10. Riley Leonard 1,333 posts
- 11. Redzone 11.8K posts
- 12. Joe Burrow 6,183 posts
- 13. Lamar Jackson 2,921 posts
- 14. Aaron Rodgers 3,791 posts
- 15. Tee Higgins 4,040 posts
- 16. #BillsMafia 7,417 posts
- 17. #CFPRankings 2,434 posts
- 18. #HardRockBet 3,869 posts
- 19. #HereWeGo 2,138 posts
- 20. Pearl Harbor 53.6K posts
You might like
-
Sasha Rush
@srush_nlp -
Soumith Chintala
@soumithchintala -
Jรผrgen Schmidhuber
@SchmidhuberAI -
Sam Bowman
@sleepinyourhat -
Percy Liang
@percyliang -
Aran Komatsuzaki
@arankomatsuzaki -
Christopher Manning
@chrmanning -
Chris Olah
@ch402 -
hardmaru
@hardmaru -
Sebastian Ruder
@seb_ruder -
Jacob Andreas
@jacobandreas -
Kyunghyun Cho
@kchonyc -
Thomas Wolf
@Thom_Wolf -
Tim Rocktรคschel
@_rockt -
Eric Jang
@ericjang11
Something went wrong.
Something went wrong.