i saw another tweet about someone beating a benchmark with rl after training directly on it and open sourcing the project. what's literally the point? are startups just showing off or have we forgotten basic train/val/test splits ever since rl went mainstream?


United States 趨勢
Loading...

Something went wrong.


Something went wrong.