ankesh_anand's profile picture. Research scientist @googledeepmind (Gemini Thinking, Post-Training), prev phd @milamontreal. RL for Gemini 2.5, IMO DeepThink, ComputerUse. Opinions are my own.

Ankesh Anand

@ankesh_anand

Research scientist @googledeepmind (Gemini Thinking, Post-Training), prev phd @milamontreal. RL for Gemini 2.5, IMO DeepThink, ComputerUse. Opinions are my own.

Pinned

2.5 Pro is our new frontier model: fresh big model smell with extremely strong reasoning / thinking capabilities. We report single attempt / pass@1 scores for clean comparisons.

ankesh_anand's tweet image. 2.5 Pro is our new frontier model: fresh big model smell with extremely strong reasoning / thinking capabilities. 

We report single attempt / pass@1 scores for clean comparisons.

Here we go! A new 2.5 Pro with all around capability improvements compared to previous versions. - Much better at code editing now, sota on Aider (82.2), try out this model on cursor! - #1 on webdev-arena (surpassing opus 4). - supports budgets now (128 to 32k) - much better at…

ankesh_anand's tweet image. Here we go! A new 2.5 Pro with all around capability improvements compared to previous versions. 

- Much better at code editing now, sota on Aider (82.2), try out this model on cursor!
- #1 on webdev-arena (surpassing opus 4).
- supports budgets now (128 to 32k)
- much better at…

📈📈📈

ankesh_anand's tweet image. 📈📈📈

Big update to our MathArena USAMO evaluation: Gemini 2.5 Pro, which was released *the same day* as our benchmark, is the first model to achieve non-trivial amount of points (24.4%). The speed of progress is really mind-blowing.

mbalunovic's tweet image. Big update to our MathArena USAMO evaluation: Gemini 2.5 Pro, which was released *the same day* as our benchmark, is the first model to achieve non-trivial amount of points (24.4%). The speed of progress is really mind-blowing.


shoutout to the believers!

ankesh_anand's tweet image. shoutout to the believers!

The whole surprise over 5.5M$ was because everyone is anchored to Llama3’s compute efficiency. Wenfeng himself said it’s about two generations behind frontier lab numbers. Sonnet costs “tens of millions” of dollars, I hope we release the 2.0 Flash / Flash Thinking numbers as…

ankesh_anand's tweet image. The whole surprise over 5.5M$ was because everyone is anchored to Llama3’s compute efficiency. 

Wenfeng himself said it’s about two generations behind frontier lab numbers. Sonnet costs “tens of millions” of dollars, I hope we release the 2.0 Flash / Flash Thinking numbers as…

Loading...

Something went wrong.


Something went wrong.