Chi Nguyen reposted
How do LLMs reason about playing games against copies of themselves? 🪞We made the first LLM decision theory benchmark to find out. 🧵1/10
2
19
102
47
11K
Chi Nguyen reposted
How close are current AI agents to automating AI R&D? Our new ML research engineering benchmark (RE-Bench) addresses this question by directly comparing frontier models such as Claude 3.5 Sonnet and o1-preview with 50+ human experts on 7 challenging research engineering tasks.
15
172
840
521
444K
United States Trends
- 1. Renee 615 B posts
- 2. Good Thursday 24,8 B posts
- 3. Charlie Kirk 122 B posts
- 4. Trae 101 B posts
- 5. Macklin Celebrini 3.368 posts
- 6. hudson 263 B posts
- 7. Jesse Watters 14,2 B posts
- 8. Hawks 50,7 B posts
- 9. The ICE 2,16 Mn posts
- 10. #BeckyxCHANELCocoCrush 566 B posts
- 11. Zcash 4.953 posts
- 12. REBECCA X CHANEL LOS ANGELES 560 B posts
- 13. Noem 391 B posts
- 14. Salt Lake City 11,4 B posts
- 15. jimmy fallon 45,2 B posts
- 16. Wizards 52,1 B posts
- 17. Jeopardy 7.507 posts
- 18. Blazers 3.625 posts
- 19. Sharks 9.487 posts
- 20. Gestapo 137 B posts
Loading...
Something went wrong.
Something went wrong.