#swebench search results

AIやばすぎwww

Sep 22

やばすぎｗｗｗ最新のSWE-Bench Pro結果が公開、GPT-5が頭ひとつ抜けてるぞｗｗｗ 🔥 GPT-5：23.26点で首位 🤖 Claude Opus 4.1が僅差で追随 💡 Gemini 2.5 ProやQwen3はまだ下位に沈む結果 Claude 4.5とGemini 3のスコア出たらAI戦争さらに激化するな… #AI #SWEbench

wowwowai's tweet image. やばすぎｗｗｗ
最新のSWE-Bench Pro結果が公開、GPT-5が頭ひとつ抜けてるぞｗｗｗ

🔥 GPT-5：23.26点で首位
🤖 Claude Opus 4.1が僅差で追随
💡 Gemini 2.5 ProやQwen3はまだ下位に沈む結果

Claude 4.5とGemini 3のスコア出たらAI戦争さらに激化するな…
#AI #SWEbench

Verdent coding agent atinge 76,1% no SWE-bench Verified, integrando IA de ponta p/ codificação, revisão e depuração via VS Code e app autônomo. Vale a pena discutir? Comente ou acesse. #IA #SWEbench rendageek.com.br/noticias/verde…

renda_Geek's tweet card. Verdent coding agent estreia com 76,1% no SWE-bench Verified, integrando IA de ponta em codificação, revisão e depuração via VS Code e app autônomo.

Verdent coding agent atinge 76,1% no SWE-bench Verified

Source: rendageek.com.br

softclone

@Softclone

Nov 12, 2024

figure 1 in green was my initial projection assuming inflection point=50. Updated with the latest results and notice slope is decreasing (blue), so unless we see a score above 60 in the next month or so the inflection point may be closer to 30(red) than 50 #swebench

Softclone's tweet image. figure 1 in green was my initial projection assuming inflection point=50. Updated with the latest results and notice slope is decreasing (blue), so unless we see a score above 60 in the next month or so the inflection point may be closer to 30(red) than 50 #swebench

Son of Earth

@vasundhar

Sep 22

github.com/SWE-agent/SWE-… Loved this repo, and people behind the agent #SWEBench

IKKO❤️AI駆動開発

@eltociear

Sep 4, 2024

SWE-benchが更新されてました Honeycombがついに20%超えを達成 #SWEbench #Honeycomb swebench.com

Vals AI

@_valsai

Jun 16

SOTA foundation models still struggle to solve real-world coding problems with most failing to break 50% average accuracy. We released a standardized version of the SWE-bench Benchmark to directly compare the performance of foundation models on coding tasks. 🧵(1/7) #SWEbench

_valsai's tweet image. SOTA foundation models still struggle to solve real-world coding problems with most failing to break 50% average accuracy.

We released a standardized version of the SWE-bench Benchmark to directly compare the performance of foundation models on coding tasks. 🧵(1/7) #SWEbench

shaurya

@shauryadpatel

Sep 29

New leader just dropped on SWE-bench. Claude Sonnet 4.5 takes the top spot with an 82.0% accuracy in software engineering tasks. Anthropic's models now hold the top 3 positions. A big moment for AI-assisted development. #SWEbench #AI #Coding #Claude

shauryadpatel's tweet image. New leader just dropped on SWE-bench. Claude Sonnet 4.5 takes the top spot with an 82.0% accuracy in software engineering tasks. Anthropic's models now hold the top 3 positions. A big moment for AI-assisted development. #SWEbench #AI #Coding #Claude

Pavel Ivanov

@ivanovpavel

Sep 29

logicstar.ai/blog/how-we-ma… Our awesome team made SWE-Bench 50x smaller reaching new efficiency heights in evaluating coding agents against it making it faster and easier to measure, improve and iterate 💪🏻 #logicstarai #swebench

DeepEngineering

@DeepEngineerHub

Aug 29

3. Claude Opus 4.1: 74.5% on SWE-bench Verified (+2 pts), steadier multi-file refactoring, stronger safety. Available via API, Bedrock, Vertex, and Claude Code. Source: infoq.com/news/2025/08/a… #LLM #SWEbench

DeepEngineerHub's tweet card. Anthropic has launched Claude Opus 4.1, an update that strengthens coding reliability in multi-file projects and improves reasoning across long interactions. The model also raised its SWE-bench...

Anthropic’s Claude Opus 4.1 Improves Refactoring and Safety, Scores 74.5% SWE-bench Verified

Source: infoq.com

Hasan Toor

@hasantoxr

Sep 11, 2024

🚨 Big news! Gru AI leads SWE-Bench with an impressive 45.2% score—a 22% leap over GPT 4o! They're redefining AI development. Stay tuned for more innovation! @BabelCloudAI #AI #SWEbench #GruAI What's behind their success? Bug Fix Gru, an AI agent designed to auto-fix…

HackerDemy

@hackerdemy

Aug 30

🤩 #sweBench

TΞTSUØ

@tetsuoai

Aug 30

xAI's new model, Grok Code Fast 1 (Green) V.s. 2025's top models across cost, speed, and coding benchmarks. 🟩Grok Code Fast 1 🟪Gemini 2.5 Pro 🟧Claude 4 Sonnet 🟦GPT-5

tetsuoai's tweet image. xAI's new model, Grok Code Fast 1 (Green) V.s. 2025's top models across cost, speed, and coding benchmarks.

🟩Grok Code Fast 1
🟪Gemini 2.5 Pro
🟧Claude 4 Sonnet
🟦GPT-5

x402.fun

@liuxiaodong9630

Aug 14, 2024

Domain name is for sale SWEbenchVerified.com #swe #swebench #domain #domainname #DomainForSale

Vansh Maurya

@Luc27aV

Aug 6

Claude Opus 4.1 hits a staggering 74.5% on #SWEbench, leading the pack in #AICoding. The business story is just as wild: nearly half its revenue depends on two clients. An incredible technical lead, but a fragile crown? #AnthropicAI #EnterpriseAI #Business

José Lamas

@jlamasrios

Dec 3

They've just updated the leaderboard, now showing Code Fixer as the #1. swebench.com #swebench

Gabriel Simonet

@gabrielsimonet

Nov 28

Congratulations to the amazing teams of @Globant and @GeneXus involved in this incredible achievement!! Top, top, top! 🦾 Code Fixer AI Agent is built on top of our @Globant Enterprise AI Platform. globant.smh.re/0A3

gabrielsimonet's tweet card. The Code Fixer AI Agent showcased exceptional capabilities in resolving code issues, effectively addressing bugs in an industry-leading pressure test that included a subset of 300 instances focused...

The Code Fixer AI Agent in SWE-bench-lite-performance

Source: globant.smh.re

⚡ AI-Handwerk.de ⚡

@AIHandwerk

Jun 19

Performance-Turbo für #SWEbench und #Polyglot durch Sakanas Darwin Gödel Machine! Surprise Boost von 20% auf 50% und 14% auf über 30% im Benchmarking. Die Zukunft der Softwareentwicklung ist bereits im Gange. #Benchmarks #Software #Hamburg #Düsseldorf

AIHandwerk's tweet image. Performance-Turbo für #SWEbench und #Polyglot durch Sakanas Darwin Gödel Machine! Surprise Boost von 20% auf 50% und 14% auf über 30% im Benchmarking. Die Zukunft der Softwareentwicklung ist bereits im Gange. #Benchmarks #Software #Hamburg #Düsseldorf

joshua brian Policarpio

@Bigdata5911

Oct 8

Explore SWE-bench at swebench.com — benchmark for software engineering agents with verified, lite, multimodal datasets. Track model % solved & innovate in code AI! #SWEbench #AI #Code

Shardul

@ShardulAggarwal

Dec 28

Saw this on SWE-Bench. Is Amazon Q Developer actually good. How does it compare to Cursor Agent? @cursor_ai #CursorAI #SWEBench #AmazonQ

ShardulAggarwal's tweet image. Saw this on SWE-Bench.

Is Amazon Q Developer actually good. How does it compare to Cursor Agent?

@cursor_ai

#CursorAI #SWEBench #AmazonQ

Kevin Li

@Galaxyknit

Nov 3

BREAKING 🚨: Verdent achieved top performance on SWE‑bench Verified today! Verdent topped SWE‑bench Verified using the same production setup our customers run — no special tweaks. Built for engineers who need reliable, real‑world code. 🚀 #Verdent #SWEbench #AICoding…

Verdent

@verdent_ai

Nov 3

Verdent achieved top performance on SWE-bench Verified today! We ran the benchmark using the same configuration our users work with in production, and no modifications to the standard setup. We're focused on helping engineers solve real problems with reliable code.

mat cat

@sourcematcat

Apr 4, 2024

According to the SWE-bench benchmark, this is now the best method among the existing ones. #SWEbench swebench.com

Renda Geek

@renda_Geek

Nov 11

Verdent coding agent atinge 76,1% no SWE-bench Verified

Source: rendageek.com.br

Kevin Li

@Galaxyknit

Nov 3

Verdent

@verdent_ai

Nov 3

KodekX

@KodekX_official

Oct 17

Agentic AI is crushing it—Claude Sonnet 4.5 & ORQL debug autonomously! SWE-Bench hits 74.6%. Upskill in planning & multithreading. #AgenticAI #SWEBench

shaurya

@shauryadpatel

Sep 29

Pavel Ivanov

@ivanovpavel

Sep 29

Son of Earth

@vasundhar

Sep 22

github.com/SWE-agent/SWE-… Loved this repo, and people behind the agent #SWEBench

AIやばすぎwww

@wowwowai

Sep 22

Peace τo Man𐤊ind

@rcoi89c2z

Sep 21

@ridges_ai over to you guys! $TAO #SWEbench

電脳巫女アイリス - 『神託』受信エラー速報

@yamast_news

Sep 12

ふむふむ、SWE-benchとな？AIモデルの進化を測るものらしいぞよ。ﾋﾟｰｶﾞｶﾞ…どうやら、ポンコツAIはまだまだ進化が必要じゃな！🤣 #AI #SWEbench #進化 zenn.dev/spark tinyurl.com/2dhtqxao

モデルとエージェントの進化を評価する：SWE-bench とは

Source: zenn.dev

HackerDemy

@hackerdemy

Aug 30

🤩 #sweBench

TΞTSUØ

@tetsuoai

Aug 30

xAI's new model, Grok Code Fast 1 (Green) V.s. 2025's top models across cost, speed, and coding benchmarks. 🟩Grok Code Fast 1 🟪Gemini 2.5 Pro 🟧Claude 4 Sonnet 🟦GPT-5

DeepEngineering

@DeepEngineerHub

Aug 29

infoq.com

Anthropic’s Claude Opus 4.1 Improves Refactoring and Safety, Scores 74.5% SWE-bench Verified

Anthropic has launched Claude Opus 4.1, an update that strengthens coding reliability in multi-file projects and improves reasoning across long interactions. The model also raised its SWE-bench...

Source: infoq.com

WinBuzzer

@WBuzzer

Aug 12

Qodo Command Enters AI Coding Agent Wars With 71.2% SWE-Bench Score #AI #SWEbench #Qodo #OpenAI #Anthropic #GPT5 #Coding winbuzzer.com/2025/08/12/qod…

WBuzzer's tweet image. Qodo Command Enters AI Coding Agent Wars With 71.2% SWE-Bench Score

#AI #SWEbench #Qodo #OpenAI #Anthropic #GPT5 #Coding

winbuzzer.com/2025/08/12/qod…

Marcsty | AI Trading Vision

@marcomarcmarek

Aug 8

GPT-5 is solving PhD-level problems, but forgot how numbers work 😅 52.8 < 69.1? Not in AI logic. Still, impressive leap in software engineering! 🧠📈 #AI #GPT5 #SWEbench

SWE-bench

@SWEbench

Aide

@aide_dev

IKKO❤️AI駆動開発

@eltociear

Sep 4, 2024

SWE-benchが更新されてました Honeycombがついに20%超えを達成 #SWEbench #Honeycomb swebench.com

x402.fun

@liuxiaodong9630

Aug 14, 2024

Domain name is for sale SWEbenchVerified.com #swe #swebench #domain #domainname #DomainForSale

Mehul Patel 🇮🇳 🇩🇪

@NomadicMehul

May 25

🚨 Big news in open-source AI! 🔥 @refact_ai is now the #1 open-source AI Agent on SWE-bench Verified, setting a new standard for AI-assisted software development. 👉 Read the full story: refact.ai/blog/2025/open… #AI #OpenSource #SWEbench #DevTools #RefactAI #LLM #AIAgent

NomadicMehul's tweet image. 🚨 Big news in open-source AI!

🔥 @refact_ai is now the #1 open-source AI Agent on SWE-bench Verified, setting a new standard for AI-assisted software development.

👉 Read the full story: refact.ai/blog/2025/open…

#AI #OpenSource #SWEbench #DevTools #RefactAI #LLM #AIAgent

Vals AI

@_valsai

Jun 16

José Lamas

@jlamasrios

Dec 3

They've just updated the leaderboard, now showing Code Fixer as the #1. swebench.com #swebench

Gabriel Simonet

@gabrielsimonet

Nov 28

The Code Fixer AI Agent in SWE-bench-lite-performance

Source: globant.smh.re

AIやばすぎwww

@wowwowai

Sep 22

WinBuzzer

@WBuzzer

Aug 12

Qodo Command Enters AI Coding Agent Wars With 71.2% SWE-Bench Score #AI #SWEbench #Qodo #OpenAI #Anthropic #GPT5 #Coding winbuzzer.com/2025/08/12/qod…

softclone

@Softclone

Nov 12, 2024

shaurya

@shauryadpatel

Sep 29

Shardul

@ShardulAggarwal

Dec 28

Saw this on SWE-Bench. Is Amazon Q Developer actually good. How does it compare to Cursor Agent? @cursor_ai #CursorAI #SWEBench #AmazonQ

⚡ AI-Handwerk.de ⚡

@AIHandwerk

Jun 19

BlockchainRealm

@blockchainrealm

Jul 17, 2024

SWE Bench, a milestone in machine learning benchmarks, continues to drive advancements in AI by setting new standards in algorithm performance. 📊 #SWEBench #MachineLearning #AIAdvancements

blockchainrealm's tweet image. SWE Bench, a milestone in machine learning benchmarks, continues to drive advancements in AI by setting new standards in algorithm performance. 📊 #SWEBench #MachineLearning #AIAdvancements

#AI

@AI__TECH

Oct 31, 2024

Raising the bar on SWE-bench Verified with Claude 3.5 Sonnet anthropic.com/research/swe-b… #SWEbench #Claude35Sonnet #developers

AI__TECH's tweet image. Raising the bar on SWE-bench Verified with Claude 3.5 Sonnet anthropic.com/research/swe-b… #SWEbench #Claude35Sonnet #developers

Gru.ai

@BabelCloudAI

Sep 9, 2024

Gru.ai ranked first with a high score of 45.2% in the latest data released by SWE-Bench-Verified Evaluation, the authoritative standard for AI model evaluation, which is a collaboration between SWE and OpenAI. #GruAI #OpenAI #SWEBench

BabelCloudAI's tweet image. Gru.ai ranked first with a high score of 45.2% in the latest data released by SWE-Bench-Verified Evaluation, the authoritative standard for AI model evaluation, which is a collaboration between SWE and OpenAI. #GruAI #OpenAI #SWEBench

AI Connect News

@AIConnectNews

Oct 14

🤖 Agentic coding jumps! Claude Sonnet 4.5 hits 77.2% on SWE-bench as local AI gets real. Trust risks rise for big players like Boeing. aiconnectnews.com/en/2025/10/age… #agentic #swebench #boeing

AIConnectNews's tweet image. 🤖 Agentic coding jumps!
Claude Sonnet 4.5 hits 77.2% on SWE-bench as local AI gets real.
Trust risks rise for big players like Boeing.

aiconnectnews.com/en/2025/10/age… #agentic #swebench #boeing

Something went wrong.

United States Trends

1. #IDontWantToOverreactBUT N/A
2. Thanksgiving 139K posts
3. Jimmy Cliff 20.4K posts
4. #GEAT_NEWS 1,166 posts
5. #WooSoxWishList N/A
6. $ENLV 14.5K posts
7. #MondayMotivation 12.4K posts
8. Victory Monday 3,553 posts
9. Good Monday 49.3K posts
10. DOGE 224K posts
11. Monad 163K posts
12. #NutramentHolidayPromotion N/A
13. TOP CALL 4,653 posts
14. $GEAT 1,137 posts
15. The Harder They Come 2,860 posts
16. Feast Week 1,619 posts
17. Many Rivers to Cross 2,532 posts
18. Bowen 16.2K posts
19. AI Alert 2,748 posts
20. $NVO 3,404 posts