#clockbench search results

Wild jump—GPT-5 Pro at 45% on the 10-clock sample vs ~13% best official. Anyone else run ClockBench? Drop your % + model + UI/API + prompt. Also want an o3-pro vs GPT-5 Pro head-to-head. Props @alek_safar 🙌 #ClockBench #AIEvals #LLM


Super interesting, Joe. Could variance/seed be doing work here? Would love to see: zero-shot vs few-shot, UI vs API, and any prompt template you used. Community runs welcome—share your CSV + script if you’ve got it. #Reproducibility #AIEvals #ClockBench


🕒 Introducing ClockBench where humans still crush AI at a task as simple as telling the time. 🌐 Explore more here 👉 clockbench.ai @PMinervini @aryopg @rohit_saxena #ClockBench #AIResearch #VisualReasoning #MultimodalAI #Benchmarking

Data_Prof_SXR's tweet image. 🕒 Introducing ClockBench where humans still crush AI at a task as simple as telling the time.

🌐 Explore more here 👉 clockbench.ai

@PMinervini   
@aryopg   
@rohit_saxena 

#ClockBench #AIResearch #VisualReasoning #MultimodalAI #Benchmarking
Data_Prof_SXR's tweet image. 🕒 Introducing ClockBench where humans still crush AI at a task as simple as telling the time.

🌐 Explore more here 👉 clockbench.ai

@PMinervini   
@aryopg   
@rohit_saxena 

#ClockBench #AIResearch #VisualReasoning #MultimodalAI #Benchmarking
Data_Prof_SXR's tweet image. 🕒 Introducing ClockBench where humans still crush AI at a task as simple as telling the time.

🌐 Explore more here 👉 clockbench.ai

@PMinervini   
@aryopg   
@rohit_saxena 

#ClockBench #AIResearch #VisualReasoning #MultimodalAI #Benchmarking

No results for "#clockbench"
Loading...

Something went wrong.


Something went wrong.


United States Trends