TransluceAI's profile picture. Open and scalable technology for understanding AI systems.

Transluce

@TransluceAI

Open and scalable technology for understanding AI systems.

Transluce reposted

Agent benchmarks lose *most* of their resolution because we throw out the logs and only look at accuracy. I’m very excited that HAL is incorporating @TransluceAI’s Docent to analyze agent logs in depth. Peter’s thread is a simple example of the type of analysis this enables,…

OpenAI claims hallucinations persist because evaluations reward guessing and that GPT-5 is better calibrated. Do results from HAL support this conclusion? On AssistantBench, a general web search benchmark, GPT-5 has higher precision and lower guess rates than o3!

PKirgis's tweet image. OpenAI claims hallucinations persist because evaluations reward guessing and that GPT-5 is better calibrated. Do results from HAL support this conclusion? On AssistantBench, a general web search benchmark, GPT-5 has higher precision and lower guess rates than o3!


United States Trends

Loading...

Something went wrong.


Something went wrong.