The leaked Gemini 3 benchmarks look good and all, but I'm a little disappointed in the coding results. I guess I'll most likely be sticking with Claude

neetcode1's tweet image. The leaked Gemini 3 benchmarks look good and all, but I'm a little disappointed in the coding results. I guess I'll most likely be sticking with Claude

I don't get why there's no comparison to Opus 4.1, which is way smarter than Sonnet 4.5.


Why guess when you can know?


I’m not one to say to trust the benchmarks HOWEVER if it is that similar to Sonnet, that’s still a win imo in AI studio, Gemini 3 Pro is significantly faster than Sonnet, which would let you iterate a lot faster


to be fair, the benchmarks don’t do it justice as far as how it feels to actually code with it


It’s also about speed. Most projects do a decent job already. If speed if much faster than Claude with almost similar accuracy. It’s still a Win.


+one thousand ELO on LiveCodeBench Pro and 2:1 winrate vs 4.5 on Design Bench seems worth giving a shot


It’s honestly funny how Anthropic has been able to stay as the best coding model company


Gemini 3 Pro absolutely smokes and destroys all models in front-end taste


Anyone who has worked with actual industry codebases will soon quickly realise how ahead Claude is compared to every other llm. Gpt 5 codex is good, 3.7 level but not as good as 4 (not to mention it's often slow compared to Claude on ghcp)


SweBench sucks ass


I don't think benchmark are reliable, I was having an issue with writing a sql stored procedure and was constantly getting an error every top model failed from opus 4.1, gpt 5.1, claude 4.5 thinking but gemini 2.5 correctly identified that dbeaver was the issue


Claude is still unbeatable in coding


It's a 1% difference lol


Get 30% off Clip Studio Paint Ver. 4.0 now during our Holiday Sale!✨ Ends December 25, 8:00 am UTC


Can someone ELI5 the Vending Bench task?? Metric is net worth ($) 🤣


I'll act like I understood what these benchmarks mean irl lol


Do they have a cli?


@neetcode1 is your pro course down?


It's faster, cheaper, overall smarter. It is just absolutely nonsensical to let one benchmark result, which is mind you practically on par, dissuade you from using the model entirely


idk how you use the gemini but it completely smokes the rest of the models in everything except agentic which almost neck to neck to claude and keep in mind this is only the preview version


how are those coding tests conducted btw? how are the problems defined?


welcome to the new AI winter


benchmarks are not a religions bro lol


buddy it’s one percent


5 facts every investor should know about the NSDQ-100


United States 트렌드
Loading...

Something went wrong.


Something went wrong.