NeetCode (@neetcode1)의 트윗: The leaked Gemini 3 benchmarks look good and all, but I'm a little disappointed in the coding results. I guess I'll most likely be sticking with Claude / Piclur

NeetCode

. 11. 18.

The leaked Gemini 3 benchmarks look good and all, but I'm a little disappointed in the coding results. I guess I'll most likely be sticking with Claude

neetcode1's tweet image. The leaked Gemini 3 benchmarks look good and all, but I'm a little disappointed in the coding results. I guess I'll most likely be sticking with Claude

Daniel Tenner

@swombat

. 11. 18.

I don't get why there's no comparison to Opus 4.1, which is way smarter than Sonnet 4.5.

Premium

@premium

. 8. 5.

Why guess when you can know?

I’m not one to say to trust the benchmarks HOWEVER if it is that similar to Sonnet, that’s still a win imo in AI studio, Gemini 3 Pro is significantly faster than Sonnet, which would let you iterate a lot faster

bin

@flowybin

. 11. 18.

to be fair, the benchmarks don’t do it justice as far as how it feels to actually code with it

Divvy

@LazyDev47

. 11. 18.

It’s also about speed. Most projects do a decent job already. If speed if much faster than Claude with almost similar accuracy. It’s still a Win.

Jacob Asmuth

@JacobAsmuth

. 11. 18.

+one thousand ELO on LiveCodeBench Pro and 2:1 winrate vs 4.5 on Design Bench seems worth giving a shot

Fredd

@NotFredd3

. 11. 18.

It’s honestly funny how Anthropic has been able to stay as the best coding model company

vibebuilder

@vibebuild

. 11. 18.

Gemini 3 Pro absolutely smokes and destroys all models in front-end taste

⸻⸻

@darkknightisbac

. 11. 18.

Anyone who has worked with actual industry codebases will soon quickly realise how ahead Claude is compared to every other llm. Gpt 5 codex is good, 3.7 level but not as good as 4 (not to mention it's often slow compared to Claude on ghcp)

Nik Knack ☕

@SgtPepper901

. 11. 18.

SweBench sucks ass

Faiz

@phase_shake

. 11. 18.

I don't think benchmark are reliable, I was having an issue with writing a sql stored procedure and was constantly getting an error every top model failed from opus 4.1, gpt 5.1, claude 4.5 thinking but gemini 2.5 correctly identified that dbeaver was the issue

Arsenije Karpic

@RealArsenije

. 11. 18.

Claude is still unbeatable in coding

Shashank Aditya

@msrsaditya

. 11. 18.

It's a 1% difference lol

CLIP STUDIO PAINT

@clipstudiopaint

4 시간

Get 30% off Clip Studio Paint Ver. 4.0 now during our Holiday Sale!✨ Ends December 25, 8:00 am UTC

Fezzy

@hafezverde

. 11. 18.

Can someone ELI5 the Vending Bench task?? Metric is net worth ($) 🤣

yash shukla

@yashshu58249885

. 11. 18.

I'll act like I understood what these benchmarks mean irl lol

Ben

@WillemsBen

. 11. 18.

Do they have a cli?

Zoomba Mastra

@ZMastra

. 11. 18.

@neetcode1 is your pro course down?

DreamGzer 2023

@Dreamgzer241476

. 11. 18.

It's faster, cheaper, overall smarter. It is just absolutely nonsensical to let one benchmark result, which is mind you practically on par, dissuade you from using the model entirely

Han

@hanebox

. 11. 18.

idk how you use the gemini but it completely smokes the rest of the models in everything except agentic which almost neck to neck to claude and keep in mind this is only the preview version