v1becoder's profile picture. Head of Product - AI Software Development Agent

Nick Frolov

@v1becoder

Head of Product - AI Software Development Agent

A short moment to be proud of the work we do in refact.ai

We just updated the SWE-bench Multimodal leaderboard with new systems from @refact_ai, @allhands_ai and @TU_Muenchen. Congrats to all teams on pushing the state-of-the-art performance! SWE-bench Multimodal challenges AI systems to fix issues that are described using screenshots.

SWEbench's tweet image. We just updated the SWE-bench Multimodal leaderboard with new systems from @refact_ai, @allhands_ai and @TU_Muenchen. Congrats to all teams on pushing the state-of-the-art performance!

SWE-bench Multimodal challenges AI systems to fix issues that are described using screenshots.


Inference cost will become the dominant operating expense for knowledge companies

v1becoder's tweet image. Inference cost will become the dominant operating expense for knowledge companies

Google has the best LLM model and the worst scaffolding around it in their @GeminiApp It can't understand its own format of short links which youtube provides when you click Share button. It encounters and error when you provide full format just like it expect it to see... 🤯…

v1becoder's tweet image. Google has the best LLM model and the worst scaffolding around it in their @GeminiApp 

It can't understand its own format of short links which youtube provides when you click Share button. It encounters and error when you provide full format just like it expect it to see... 🤯…

AI/LLMs role is to augment humans not to replace them - from @karpathy Software 3.0 speech for YC School today. LLM is a software, a software with properties of savant with encyclopaedic knowledge and yet with many cognitive issues (hallucinations, amnesia, flawed…

v1becoder's tweet image. AI/LLMs role is to augment humans not to replace them - from @karpathy  Software 3.0 speech for YC School today.

LLM is a software, a software with properties of savant with encyclopaedic knowledge and yet with many cognitive issues (hallucinations, amnesia, flawed…

Refact.ai got top score on the most popular Software Engineering Benchmark for AI Agents ( both SWE Verified and SWE Lite)

v1becoder's tweet image. Refact.ai got top score on the most popular Software Engineering Benchmark for AI Agents ( both SWE Verified and SWE Lite)

We tested Gemini 2.5 Pro with refact.ai on Polyglot Benchmark The model from Google got  82.2%, behind Claude 3.7 Sonnet’s 93.3% In the video I am providing the details how our model tests are different from aiders


Will be presenting live today at 5pm CEST linkedin.com/posts/nfrolov_…

v1becoder's tweet image. Will be presenting live today at 5pm CEST
linkedin.com/posts/nfrolov_…

United States Xu hướng

Loading...

Something went wrong.


Something went wrong.