NeuralComputing reposted
DPO Debate: Is RL needed for RLHF? All things as we cannot settle if DPO or RL is better. At least it is a good exercise. 1. Derivations in the DPO paper. Hint, the authors are good at math 2. cDPO, IPO, and related equations 3. Speculation on potential oddities of DPO vs RL…
16
82
437
463
100K
United States Trends
- 1. Hato 19K posts
- 2. Tosin 8,262 posts
- 3. Trench 5,563 posts
- 4. Jacob Frey 7,562 posts
- 5. Lina Khan 3,902 posts
- 6. Godzilla 20.5K posts
- 7. Walker Kessler N/A
- 8. Gittens 4,671 posts
- 9. #questpit 26.7K posts
- 10. Supreme Court 133K posts
- 11. Estevao 15.8K posts
- 12. Gorsuch 6,664 posts
- 13. Qarabag 32.5K posts
- 14. NYPD 29.8K posts
- 15. Death Grips 3,175 posts
- 16. IEEPA 4,019 posts
- 17. Blizzcon 1,575 posts
- 18. Van Jones 11.3K posts
- 19. Lavia 7,539 posts
- 20. Alastor 60.5K posts
Loading...
Something went wrong.
Something went wrong.