#benchmarks search results

✨ Taking the stage, @Jared_Spataro here to share his thoughts and insights about “A New Frontier: Building the Future Firm with #AI#BenchMarks 📈📉📊 #M365Con Day Three Keynote

MSFTAdoption's tweet image. ✨ Taking the stage, @Jared_Spataro here to share his thoughts and insights about “A New Frontier: Building the Future Firm with #AI” #BenchMarks 📈📉📊

#M365Con Day Three Keynote
MSFTAdoption's tweet image. ✨ Taking the stage, @Jared_Spataro here to share his thoughts and insights about “A New Frontier: Building the Future Firm with #AI” #BenchMarks 📈📉📊

#M365Con Day Three Keynote
MSFTAdoption's tweet image. ✨ Taking the stage, @Jared_Spataro here to share his thoughts and insights about “A New Frontier: Building the Future Firm with #AI” #BenchMarks 📈📉📊

#M365Con Day Three Keynote

Achieving SOTA AI benchmarks in 2024 AI researchers: nobody is gonna know #ai #benchmarks


Open Deep Search (ODS) isn’t theory. It’s already outperforming closed labs: - FRAMES: 75.3% - SimpleQA: 88.3% That’s Sentient’s power: research that’s open, benchmarked, and winning. @SentientAGI @sentient_chat #SentientAGI #Benchmarks

BananaYellow88's tweet image. Open Deep Search (ODS) isn’t theory.

It’s already outperforming closed labs:
- FRAMES: 75.3%
- SimpleQA: 88.3%

That’s Sentient’s power: research that’s open, benchmarked, and winning.

@SentientAGI @sentient_chat 

#SentientAGI #Benchmarks

The fastest open-source LLM #inference stack just landed. Check out our latest #benchmarks that leave vLLM and Fireworks in the dust. 🏎️💨 Our blog has all the juicy details—but here's the 30-sec version: ⚡ Up to 4× lower P50/P95 latency on the same #H100 & L40S GPUs 📈…

predibase's tweet image. The fastest open-source LLM #inference stack just landed.

Check out our latest #benchmarks that leave vLLM and Fireworks in the dust. 🏎️💨

Our blog has all the juicy details—but here's the 30-sec version:
⚡ Up to 4× lower P50/P95 latency on the same #H100 & L40S GPUs
📈…

3️⃣ Kamil Chodoła(@ChodoKamil) provided a deep dive into performance #benchmarks and testing strategies, showcasing the rigorous processes involved in upcoming upgrades. 🔧

ECHInstitute's tweet image. 3️⃣ Kamil Chodoła(@ChodoKamil) provided a deep dive into performance #benchmarks and testing strategies, showcasing the rigorous processes involved in upcoming upgrades. 🔧

Pixel 10 Pro XL pulls a 95% stability on Wild Life Extreme Stress Test 🔥 Best loop: 3252 | Lowest loop: 3094 Google finally nailed thermal performance – no wild throttling here. 💪📱 #Pixel10Pro #Benchmarks

iphonesickness's tweet image. Pixel 10 Pro XL pulls a 95% stability on Wild Life Extreme Stress Test 🔥
Best loop: 3252 | Lowest loop: 3094

Google finally nailed thermal performance – no wild throttling here. 💪📱 #Pixel10Pro #Benchmarks

I camped overnight outside of Microcenter and got my hands on an #RTX5080 #ffxiv #benchmarks #ff14

I camped overnight outside of Microcenter to get an RTX 5080! Here's how it runs on #FFXIV youtu.be/D_CgrIaw1nU?si…

AlpacaLips_'s tweet image. I camped overnight outside of Microcenter to get an RTX 5080! Here's how it runs on #FFXIV

youtu.be/D_CgrIaw1nU?si…


You can now filter the LLM benchmark list by size. Here's the top XS models (< 2B parameters) furukama.com/llm-bob/?size=… #benchmarks #artificialintelligence

furukama's tweet image. You can now filter the LLM benchmark list by size. Here&apos;s the top XS models (&amp;lt; 2B parameters) furukama.com/llm-bob/?size=… #benchmarks #artificialintelligence

GPT-5 vs Grok 4 - SkateBench → GPT-5: 98.6% accuracy | $0.07 cost → Grok 4: 79% accuracy | $4.86 cost GPT-5 is: → 14× cheaper → More accurate → Much faster This is precision at scale. That is burn rate with lag. #GPT5 #LLM #Benchmarks

0xhyke's tweet image. GPT-5 vs Grok 4 - SkateBench

→ GPT-5: 98.6% accuracy | $0.07 cost
→ Grok 4: 79% accuracy | $4.86 cost

GPT-5 is:
→ 14× cheaper
→ More accurate
→ Much faster

This is precision at scale.
That is burn rate with lag.

#GPT5 #LLM #Benchmarks

From paying the highest-ever dividend in #FY24 to winning the #BMMunjalAward, we set new #benchmarks in building a sustainable, resilient India. 🏆 As we bid farewell to #2024, we eagerly embrace the endless possibilities that lie ahead! 🙌 #HUDCOImpact #Throwback2024


Snapdragon 8 Elite Gen 5 benchmarks are CRAZY! 📈 #qualcommsnapdragon #qualcomm #benchmarks


The TikTok ban grace period expires this week. A new study shows Meta ad prices soared during the previous brief TikTok outage – hurting small businesses the most. go.tigerpistol.com/3R2xihZ #FranchiseMarketing #TikTok #Benchmarks #LocalAdvertising #DigitalMarketing

TigerPistol's tweet image. The TikTok ban grace period expires this week. A new study shows Meta ad prices soared during the previous brief TikTok outage – hurting small businesses the most. go.tigerpistol.com/3R2xihZ

#FranchiseMarketing #TikTok #Benchmarks #LocalAdvertising #DigitalMarketing

VICON: Vision In-Context Operator Networks for Multi-Physics Fluid Dynamics Prediction openreview.net/forum?id=6V3Ym… #benchmarks #strides #learning


#Excellence in #Engineering We are commented new #benchmarks in quality & reliability At #MENASCO, we are #Committed to achieving #excellence through #expertise, #innovation & #precision delivering engineering #solutions that set new benchmarks in #quality & #reliability.

menasco_uae's tweet image. #Excellence in #Engineering 
We are commented new #benchmarks in quality &amp;amp; reliability
At #MENASCO, we are #Committed  to achieving #excellence through #expertise, #innovation &amp;amp; #precision delivering engineering #solutions that set new benchmarks in #quality &amp;amp; #reliability.
menasco_uae's tweet image. #Excellence in #Engineering 
We are commented new #benchmarks in quality &amp;amp; reliability
At #MENASCO, we are #Committed  to achieving #excellence through #expertise, #innovation &amp;amp; #precision delivering engineering #solutions that set new benchmarks in #quality &amp;amp; #reliability.

Youre device is too powerful...#S24 ....never seen that before #benchmarks #snapdragon

TheManInBlackZ's tweet image. Youre device is too powerful...#S24 ....never seen that before #benchmarks #snapdragon
TheManInBlackZ's tweet image. Youre device is too powerful...#S24 ....never seen that before #benchmarks #snapdragon
TheManInBlackZ's tweet image. Youre device is too powerful...#S24 ....never seen that before #benchmarks #snapdragon

NVIDIA Blackwell Outshines in InferenceMAX™ v1 Benchmarks NVIDIA's Blackwell architecture demonstrates significant performance and efficiency gains in SemiAnalysis's InferenceMAX™ v1 benchmarks, setting new standa ➤ jmpto.net/pFyef #Benchmarks #Inferencemax #Nvidia


Dextr: Zero-Shot Neural Architecture Search with Singular Value Decomposition and Extrinsic Curva... Rohan Asthana, Joschua Conrad, Maurits Ortmanns, Vasileios Belagiannis. Action editor: Frederic Sala. openreview.net/forum?id=X0vPo… #cnn #benchmarks #netw


GLM-4.6 benchmarks: Grok 4 third in intelligence at 65, Grok Fast fifth! Solid showing vs. GPT-5 top spots. Speed/price balanced. 📊 @xai @ArtificialAnlys #AI #Benchmarks artificialanalysis.ai/models/glm-4-6…

TonyMidtrud's tweet image. GLM-4.6 benchmarks: Grok 4 third in intelligence at 65, Grok Fast fifth! Solid showing vs. GPT-5 top spots. Speed/price balanced. 📊 @xai @ArtificialAnlys #AI #Benchmarks
artificialanalysis.ai/models/glm-4-6…

3️⃣ Kamil Chodoła(@ChodoKamil) provided a deep dive into performance #benchmarks and testing strategies, showcasing the rigorous processes involved in upcoming upgrades. 🔧

ECHInstitute's tweet image. 3️⃣ Kamil Chodoła(@ChodoKamil) provided a deep dive into performance #benchmarks and testing strategies, showcasing the rigorous processes involved in upcoming upgrades. 🔧

Policies and regulations change fast. Are you ready when they do? Benchmarks helps you stay informed, turn policy into action, and make your voice count. Don’t get caught off guard—lead with confidence. #BecomeAMemeber #Benchmarks

BenchmarksNC's tweet image. Policies and regulations change fast. Are you ready when they do? Benchmarks helps you stay informed, turn policy into action, and make your voice count. Don’t get caught off guard—lead with confidence. #BecomeAMemeber #Benchmarks

11/ What we still need: rigorous benchmarks, domain-specific safety models, continuous behavioral audits, and real oversight rails. Speed without stewardship isn’t progress. #Benchmarks #Standards #AICompliance


2. Verifiers: This method lets the LLM give a free-form answer (like in math or code), and a tool checks if the final result is correct. It's a step up from multiple-choice but only works for problems with a clear right or wrong answer. #MachineLearning #Benchmarks

WaghHimanshu's tweet image. 2. Verifiers: This method lets the LLM give a free-form answer (like in math or code), and a tool checks if the final result is correct. It&apos;s a step up from multiple-choice but only works for problems with a clear right or wrong answer. #MachineLearning #Benchmarks

Claude Sonnet 4.5 just topped SWE-bench Verified (n=500) with 82% accuracy — outperforming Opus 4.1, Sonnet 4, GPT-5 Codex, GPT-5, and Gemini 2.5 Pro. Software engineering benchmark results are clear: Sonnet 4.5 leads. #AI #SoftwareEngineering #Benchmarks #Craftvideo

AgenticLabsLtd's tweet image. Claude Sonnet 4.5 just topped SWE-bench Verified (n=500) with 82% accuracy — outperforming Opus 4.1, Sonnet 4, GPT-5 Codex, GPT-5, and Gemini 2.5 Pro.

Software engineering benchmark results are clear: 
Sonnet 4.5 leads.

#AI #SoftwareEngineering #Benchmarks #Craftvideo

#DiabloIV #Benchmarks - 38 GPUs tested✅(yesterday) - 14 CPUs tested✅(today) Enjoy! computerbase.de/2023-06/diablo…

ComputerBase's tweet image. #DiabloIV #Benchmarks

- 38 GPUs tested✅(yesterday)
- 14 CPUs tested✅(today)

Enjoy!

computerbase.de/2023-06/diablo…

✨ Taking the stage, @Jared_Spataro here to share his thoughts and insights about “A New Frontier: Building the Future Firm with #AI#BenchMarks 📈📉📊 #M365Con Day Three Keynote

MSFTAdoption's tweet image. ✨ Taking the stage, @Jared_Spataro here to share his thoughts and insights about “A New Frontier: Building the Future Firm with #AI” #BenchMarks 📈📉📊

#M365Con Day Three Keynote
MSFTAdoption's tweet image. ✨ Taking the stage, @Jared_Spataro here to share his thoughts and insights about “A New Frontier: Building the Future Firm with #AI” #BenchMarks 📈📉📊

#M365Con Day Three Keynote
MSFTAdoption's tweet image. ✨ Taking the stage, @Jared_Spataro here to share his thoughts and insights about “A New Frontier: Building the Future Firm with #AI” #BenchMarks 📈📉📊

#M365Con Day Three Keynote

鯖缶のぼやき 今流行り?のベンチマークを動かしてみた #MonsterHunterWilds #Benchmarks

yanoyano4649's tweet image. 鯖缶のぼやき
今流行り?のベンチマークを動かしてみた

#MonsterHunterWilds
#Benchmarks

We just released our evaluation of @MistralAI Medium 3 across all of our benchmarks! 🧵(1/6) #AI #LLM #Benchmarks

_valsai's tweet image. We just released our evaluation of @MistralAI Medium 3 across all of our benchmarks! 🧵(1/6)
 #AI #LLM #Benchmarks

Do you know how Google PaLM2 model powering Bard compares to other LLMs? 🤔 Tomorrow at GitHub SF I will compare publicly available benchmarks for PaLM2, GPT-4, GPT-3.5 and Llama2 representing open source! RSVP now! Last seats 👉🏻 meetup.com/graphql-sf/eve… #ai #benchmarks ✨🚀

gerardsans's tweet image. Do you know how Google PaLM2 model powering Bard compares to other LLMs? 🤔

Tomorrow at GitHub SF I will compare publicly available benchmarks for PaLM2, GPT-4, GPT-3.5 and Llama2 representing open source!

RSVP now! Last seats 👉🏻
meetup.com/graphql-sf/eve…

#ai #benchmarks ✨🚀

The fastest open-source LLM #inference stack just landed. Check out our latest #benchmarks that leave vLLM and Fireworks in the dust. 🏎️💨 Our blog has all the juicy details—but here's the 30-sec version: ⚡ Up to 4× lower P50/P95 latency on the same #H100 & L40S GPUs 📈…

predibase's tweet image. The fastest open-source LLM #inference stack just landed.

Check out our latest #benchmarks that leave vLLM and Fireworks in the dust. 🏎️💨

Our blog has all the juicy details—but here&apos;s the 30-sec version:
⚡ Up to 4× lower P50/P95 latency on the same #H100 &amp;amp; L40S GPUs
📈…

#benchmarks Places of worship across the island of Ireland bear a tangible link to the legacy of the Ordnance Survey which mapped Ireland nearly 200 years ago. The OS was the completion of the world’s first large scale mapping of an entire country.

Heritage_Nina's tweet image. #benchmarks 
Places of worship across the island of Ireland bear a tangible link to the legacy of the Ordnance Survey which mapped Ireland nearly 200 years ago. The OS was the completion of the world’s first large scale mapping of an entire country.
Heritage_Nina's tweet image. #benchmarks 
Places of worship across the island of Ireland bear a tangible link to the legacy of the Ordnance Survey which mapped Ireland nearly 200 years ago. The OS was the completion of the world’s first large scale mapping of an entire country.
Heritage_Nina's tweet image. #benchmarks 
Places of worship across the island of Ireland bear a tangible link to the legacy of the Ordnance Survey which mapped Ireland nearly 200 years ago. The OS was the completion of the world’s first large scale mapping of an entire country.
Heritage_Nina's tweet image. #benchmarks 
Places of worship across the island of Ireland bear a tangible link to the legacy of the Ordnance Survey which mapped Ireland nearly 200 years ago. The OS was the completion of the world’s first large scale mapping of an entire country.

The TikTok ban grace period expires this week. A new study shows Meta ad prices soared during the previous brief TikTok outage – hurting small businesses the most. go.tigerpistol.com/3R2xihZ #FranchiseMarketing #TikTok #Benchmarks #LocalAdvertising #DigitalMarketing

TigerPistol's tweet image. The TikTok ban grace period expires this week. A new study shows Meta ad prices soared during the previous brief TikTok outage – hurting small businesses the most. go.tigerpistol.com/3R2xihZ

#FranchiseMarketing #TikTok #Benchmarks #LocalAdvertising #DigitalMarketing

CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibili... Zachary S Siegel, Sayash Kapoor, Nitya Nadgir, Benedikt Stroebl, Arvind Narayanan tmlr.infinite-conf.org/paper_pages/Bs… #benchmark #benchmarks #ai

TmlrVideos's tweet image. CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibili...

Zachary S Siegel, Sayash Kapoor, Nitya Nadgir, Benedikt Stroebl, Arvind Narayanan

tmlr.infinite-conf.org/paper_pages/Bs…

#benchmark #benchmarks #ai

3️⃣ Kamil Chodoła(@ChodoKamil) provided a deep dive into performance #benchmarks and testing strategies, showcasing the rigorous processes involved in upcoming upgrades. 🔧

ECHInstitute's tweet image. 3️⃣ Kamil Chodoła(@ChodoKamil) provided a deep dive into performance #benchmarks and testing strategies, showcasing the rigorous processes involved in upcoming upgrades. 🔧

GPT-5 vs Grok 4 - SkateBench → GPT-5: 98.6% accuracy | $0.07 cost → Grok 4: 79% accuracy | $4.86 cost GPT-5 is: → 14× cheaper → More accurate → Much faster This is precision at scale. That is burn rate with lag. #GPT5 #LLM #Benchmarks

0xhyke's tweet image. GPT-5 vs Grok 4 - SkateBench

→ GPT-5: 98.6% accuracy | $0.07 cost
→ Grok 4: 79% accuracy | $4.86 cost

GPT-5 is:
→ 14× cheaper
→ More accurate
→ Much faster

This is precision at scale.
That is burn rate with lag.

#GPT5 #LLM #Benchmarks

It's important to use proper #benchmarks and #evaluation methods to validate your #models, especially for time series

PyLadiesParis's tweet image. It&apos;s important to use proper #benchmarks and #evaluation methods to validate your #models, especially for time series

🚀 New benchmarks are live for @reactnative 0.77! Compare how your current React Native version stacks up against 0.77 at reactnativebenchmark.dev Huge thanks to everyone contributing to React Native! 🙌 #ReactNative #Benchmarks

Dream11Engg's tweet image. 🚀 New benchmarks are live for @reactnative 0.77!

Compare how your current React Native version stacks up against 0.77 at reactnativebenchmark.dev

Huge thanks to everyone contributing to React Native! 🙌 #ReactNative #Benchmarks

Curious about how the latest react-native version 0.76.0-rc.0 is performing on benchmarks. Check out our new dashboard by @Dream11Engg that give you insights and comparison between all versions starting from 0.73. dream-sports-labs.github.io/rn-benchmarkin…@reactnative #benchmarks

mayankkussh's tweet image. Curious about how the latest react-native version 0.76.0-rc.0 is performing on benchmarks. Check out our new dashboard by @Dream11Engg  that give you insights and comparison between all versions starting from 0.73. dream-sports-labs.github.io/rn-benchmarkin……
@reactnative #benchmarks

You can now filter the LLM benchmark list by size. Here's the top XS models (< 2B parameters) furukama.com/llm-bob/?size=… #benchmarks #artificialintelligence

furukama's tweet image. You can now filter the LLM benchmark list by size. Here&apos;s the top XS models (&amp;lt; 2B parameters) furukama.com/llm-bob/?size=… #benchmarks #artificialintelligence

Loading...

Something went wrong.


Something went wrong.


United States Trends