
Transluce
@TransluceAI
Open and scalable technology for understanding AI systems.
Joined October 2024
Transluce reposted
Agent benchmarks lose *most* of their resolution because we throw out the logs and only look at accuracy. I’m very excited that HAL is incorporating @TransluceAI’s Docent to analyze agent logs in depth. Peter’s thread is a simple example of the type of analysis this enables,…
OpenAI claims hallucinations persist because evaluations reward guessing and that GPT-5 is better calibrated. Do results from HAL support this conclusion? On AssistantBench, a general web search benchmark, GPT-5 has higher precision and lower guess rates than o3!

1
12
39
20
21K
3
11
69
27
15K
United States Trends
- 1. Auburn 45.3K posts
- 2. Brewers 64.2K posts
- 3. Georgia 67.3K posts
- 4. Cubs 55.6K posts
- 5. Kirby 23.9K posts
- 6. Utah 24.6K posts
- 7. Arizona 41.4K posts
- 8. #byucpl N/A
- 9. Gilligan 5,936 posts
- 10. #AcexRedbull 3,831 posts
- 11. #BYUFootball 1,007 posts
- 12. Michigan 62.5K posts
- 13. Hugh Freeze 3,233 posts
- 14. #Toonami 2,704 posts
- 15. Boots 50K posts
- 16. Amy Poehler 4,463 posts
- 17. Kyle Tucker 3,178 posts
- 18. Dissidia 5,771 posts
- 19. #GoDawgs 5,561 posts
- 20. Tina Fey 3,477 posts
Loading...
Something went wrong.
Something went wrong.