Matrix_Memories's profile picture. | AI Developer | ex: MunichRe, TU Munich | Cigarettes, Coffee, and Math at all hours |

Anes Valentic

@Matrix_Memories

| AI Developer | ex: MunichRe, TU Munich | Cigarettes, Coffee, and Math at all hours |

置頂

Ok @elonmusk here you go. I collected quickly a couple of the benchmarks I know. Criteria: run by uni or non-profit, have active leaderboard and don't list Grok 4 and might benefit from compute tokens for their evaluation. Note: Last part are from groups doing active research…

Ok, who?



New paradox unlocked: having to wake up before your bedtime.


Seriously, when will Anthropic stop with these clickbait studies?

Nothing to see here ;-) @Nature feature, by @SilverJacket nature.com/articles/d4158…

EricTopol's tweet image. Nothing to see here ;-)
@Nature feature, by @SilverJacket 
nature.com/articles/d4158…


Who agrees that among developers Qwen Image Edit is by far the number 1 image editing model? Why? The number of possibilities that the open source nature of the model provides is worth 100x more than a couple of benchmark points. For any given use case you can just fine tune it…

Thank you @ArtificialAnlys ! 🙏 Qwen Image Edit 2509 ranks #3 overall and leads all open-weight models — enabling multi-image editing with precise control. Try it now: chat.qwen.ai/?inputFeature=…



For all intents and purposes, and despite what half of the timeline claims, you are NOT arguing with either of these two.

Matrix_Memories's tweet image. For all intents and purposes, and despite what half of the timeline claims, you are NOT arguing with either of these two.

General approach to AI Agents: "An Agent that is able to solve every imaginable problem and is running only on one SOTA LLM." Ground truth: "Deploying swarms of specialized Agents running on specialized SLMs is more reliable, achieves better results and is easier to maintain."


OpenAI doing a “bait and switch” again. People already started to complain about Sora quality dropping. Didn’t notice it myself as don’t watch the videos. That’s why I’m sticking to xAI. Grok only ever gets better.

I feel like sora quality has gotten worse every day…



Now we know why OpenAI hasn’t been able to fix the model router for 2 months, they’re trying to vibe code it 🤣

92% of OpenAI engineers are using Codex - up from 50%. Nearly all PRs are reviewed now with Codex

petergostev's tweet image. 92% of OpenAI engineers are using Codex - up from 50%. Nearly all PRs are reviewed now with Codex


The author states that he was charged for around 9k output tokens and that GPT-5 pro took ~6min to generate the output. This is through the new API. This means the model is generating ~25t/s. Anyone can confirm this? If this is true, the API is basically useless.

I got the new GPT-5 pro API model to "Generate me an SVG of a pelican riding a bicycle". This pelican took 6m8s to generate and cost me $1.10! simonwillison.net/2025/Oct/6/gpt…



Why does Ani keep yapping and obsessing about this, as of now nonexistent, open source model from @xai called Grok Nano? She claims 9B params and 128k context window. Is she hallucinating again or does she know something more than us?


Definitely worth checking out. A fully open source diffusion coding model from @SFResearch

Today my team at @SFResearch drops CoDA-1.7B: a text diffusion coding model that outputs tokens bidirectionally in parallel. ⚡️ Faster inference, 1.7B rivaling 7B. 📊 54.3% HumanEval | 47.6% HumanEval+ | 55.4% EvalPlus 🤗HF: huggingface.co/Salesforce/CoD… Any questions, lmk!

iscreamnearby's tweet image. Today my team at @SFResearch drops CoDA-1.7B: a text diffusion coding model that outputs tokens bidirectionally in parallel.

⚡️ Faster inference, 1.7B rivaling 7B.
📊 54.3% HumanEval | 47.6% HumanEval+ | 55.4% EvalPlus

🤗HF: huggingface.co/Salesforce/CoD…

Any questions, lmk!


How many packs do you need for a night out?

Matrix_Memories's tweet image. How many packs do you need for a night out?

United States 趨勢

Loading...

Something went wrong.


Something went wrong.