yoavgo's profile picture.

(((ل()(ل() 'yoav))))👾

@yoavgo

Épinglé

what there is no overwhelming agreement on is what will happen the day after. how do we build a livable future here, for both israelis and palestinians. israel leadership is not thinking of this now, future looks grim. THIS is where protest effort and outrage should be invested.


remind me why we call the LLM reasoning samples during training "trajectories" and not "sampled responses"?


wait so the GRPO everyone are drooling about is just REINFORCE with the baseline computed as an average over a large sample (and the usual kl regularization in llm models)?


this classic figure is wrong.

yoavgo's tweet image. this classic figure is wrong.

I complain a lot about RL lately, and here we go again. The CS view of RL is wrong in how it thinks about rewards, already at the setup level. Briefly, the reward computation should be part of the agent, not part of the environment. More at length here: gist.github.com/yoavg/3eb3e722…



I complain a lot about RL lately, and here we go again. The CS view of RL is wrong in how it thinks about rewards, already at the setup level. Briefly, the reward computation should be part of the agent, not part of the environment. More at length here: gist.github.com/yoavg/3eb3e722…


candy attracts mutant ants with variable number of legs

What does this picture teach you??

Naam_Hi_Kafi_H's tweet image. What does this picture teach you??


this is a survey. when you think of a "model" as in "model based RL", what do you have in mind? (in other words, what is a model in this sense?)


turns out some disciplines/people use "a bellman equation" to mean *any* recursive equation which is amenable to DP. in that sense clearly the concept is important. i was talking specifically about the update rule for computing a value function using tabulation.

why are the bellman equations considered foundational or important today? arent they just a straightforward application of DP to solve a problem that only arises in extremely simplified cases that never occur in practice?



it is not "a confession" stop calling things with the most misleading names you can


whats the least french sandwich you could think off

Say what you want about the French, they get sandwiches right... 👇👇👇💯



i am feeling a bit sick and cannot concentrate, perfect time for some science-adjacent online fighting


hot take on the bun purchase: turns out coding agents cannot replace engineers just yet, huh.


actually this should make companies more reluctant to rely on bun, not less.

People frequently ask: > How is Bun sustainable? If I bet my company’s tech stack on Bun, will Bun still be around in a few years? We didn’t have a great answer to this question, until today



Loading...

Something went wrong.


Something went wrong.