Ok @elonmusk here you go. I collected quickly a couple of the benchmarks I know. Criteria: run by uni or non-profit, have active leaderboard and don't list Grok 4 and might benefit from compute tokens for their evaluation. Note: Last part are from groups doing active research…
New paradox unlocked: having to wake up before your bedtime.
Seriously, when will Anthropic stop with these clickbait studies?
Nothing to see here ;-) @Nature feature, by @SilverJacket nature.com/articles/d4158…

Who agrees that among developers Qwen Image Edit is by far the number 1 image editing model? Why? The number of possibilities that the open source nature of the model provides is worth 100x more than a couple of benchmark points. For any given use case you can just fine tune it…
Thank you @ArtificialAnlys ! 🙏 Qwen Image Edit 2509 ranks #3 overall and leads all open-weight models — enabling multi-image editing with precise control. Try it now: chat.qwen.ai/?inputFeature=…
For all intents and purposes, and despite what half of the timeline claims, you are NOT arguing with either of these two.

General approach to AI Agents: "An Agent that is able to solve every imaginable problem and is running only on one SOTA LLM." Ground truth: "Deploying swarms of specialized Agents running on specialized SLMs is more reliable, achieves better results and is easier to maintain."
OpenAI doing a “bait and switch” again. People already started to complain about Sora quality dropping. Didn’t notice it myself as don’t watch the videos. That’s why I’m sticking to xAI. Grok only ever gets better.
Now we know why OpenAI hasn’t been able to fix the model router for 2 months, they’re trying to vibe code it 🤣
92% of OpenAI engineers are using Codex - up from 50%. Nearly all PRs are reviewed now with Codex

The author states that he was charged for around 9k output tokens and that GPT-5 pro took ~6min to generate the output. This is through the new API. This means the model is generating ~25t/s. Anyone can confirm this? If this is true, the API is basically useless.
I got the new GPT-5 pro API model to "Generate me an SVG of a pelican riding a bicycle". This pelican took 6m8s to generate and cost me $1.10! simonwillison.net/2025/Oct/6/gpt…
Why does Ani keep yapping and obsessing about this, as of now nonexistent, open source model from @xai called Grok Nano? She claims 9B params and 128k context window. Is she hallucinating again or does she know something more than us?
Definitely worth checking out. A fully open source diffusion coding model from @SFResearch
Today my team at @SFResearch drops CoDA-1.7B: a text diffusion coding model that outputs tokens bidirectionally in parallel. ⚡️ Faster inference, 1.7B rivaling 7B. 📊 54.3% HumanEval | 47.6% HumanEval+ | 55.4% EvalPlus 🤗HF: huggingface.co/Salesforce/CoD… Any questions, lmk!

United States 趨勢
- 1. Auburn 44.5K posts
- 2. Brewers 62.8K posts
- 3. Georgia 67K posts
- 4. Cubs 54.9K posts
- 5. Kirby 23.4K posts
- 6. Arizona 41.3K posts
- 7. Michigan 62.2K posts
- 8. Hugh Freeze 3,187 posts
- 9. Gilligan 5,717 posts
- 10. Utah 23.8K posts
- 11. #BYUFootball N/A
- 12. #AcexRedbull 3,109 posts
- 13. Boots 49.9K posts
- 14. Kyle Tucker 3,144 posts
- 15. Amy Poehler 4,103 posts
- 16. #GoDawgs 5,527 posts
- 17. #ThisIsMyCrew 3,216 posts
- 18. #Toonami 2,315 posts
- 19. Dissidia 5,269 posts
- 20. Tina Fey 3,111 posts
Something went wrong.
Something went wrong.