#benchmarks 搜索结果

✨ Taking the stage, @Jared_Spataro here to share his thoughts and insights about “A New Frontier: Building the Future Firm with #AI#BenchMarks 📈📉📊 #M365Con Day Three Keynote

MSFTAdoption's tweet image. ✨ Taking the stage, @Jared_Spataro here to share his thoughts and insights about “A New Frontier: Building the Future Firm with #AI” #BenchMarks 📈📉📊

#M365Con Day Three Keynote
MSFTAdoption's tweet image. ✨ Taking the stage, @Jared_Spataro here to share his thoughts and insights about “A New Frontier: Building the Future Firm with #AI” #BenchMarks 📈📉📊

#M365Con Day Three Keynote
MSFTAdoption's tweet image. ✨ Taking the stage, @Jared_Spataro here to share his thoughts and insights about “A New Frontier: Building the Future Firm with #AI” #BenchMarks 📈📉📊

#M365Con Day Three Keynote

Snapdragon 8 Elite Gen 5 benchmarks are CRAZY! 📈 #qualcommsnapdragon #qualcomm #benchmarks


Achieving SOTA AI benchmarks in 2024 AI researchers: nobody is gonna know #ai #benchmarks


3️⃣ Kamil Chodoła(@ChodoKamil) provided a deep dive into performance #benchmarks and testing strategies, showcasing the rigorous processes involved in upcoming upgrades. 🔧

ECHInstitute's tweet image. 3️⃣ Kamil Chodoła(@ChodoKamil) provided a deep dive into performance #benchmarks and testing strategies, showcasing the rigorous processes involved in upcoming upgrades. 🔧

Open Deep Search (ODS) isn’t theory. It’s already outperforming closed labs: - FRAMES: 75.3% - SimpleQA: 88.3% That’s Sentient’s power: research that’s open, benchmarked, and winning. @SentientAGI @sentient_chat #SentientAGI #Benchmarks

BananaYellow88's tweet image. Open Deep Search (ODS) isn’t theory.

It’s already outperforming closed labs:
- FRAMES: 75.3%
- SimpleQA: 88.3%

That’s Sentient’s power: research that’s open, benchmarked, and winning.

@SentientAGI @sentient_chat 

#SentientAGI #Benchmarks

Pixel 10 Pro XL pulls a 95% stability on Wild Life Extreme Stress Test 🔥 Best loop: 3252 | Lowest loop: 3094 Google finally nailed thermal performance – no wild throttling here. 💪📱 #Pixel10Pro #Benchmarks

iphonesickness's tweet image. Pixel 10 Pro XL pulls a 95% stability on Wild Life Extreme Stress Test 🔥
Best loop: 3252 | Lowest loop: 3094

Google finally nailed thermal performance – no wild throttling here. 💪📱 #Pixel10Pro #Benchmarks

I camped overnight outside of Microcenter and got my hands on an #RTX5080 #ffxiv #benchmarks #ff14

I camped overnight outside of Microcenter to get an RTX 5080! Here's how it runs on #FFXIV youtu.be/D_CgrIaw1nU?si…

AlpacaLips_'s tweet image. I camped overnight outside of Microcenter to get an RTX 5080! Here's how it runs on #FFXIV

youtu.be/D_CgrIaw1nU?si…


VICON: Vision In-Context Operator Networks for Multi-Physics Fluid Dynamics Prediction openreview.net/forum?id=6V3Ym… #benchmarks #strides #learning


The fastest open-source LLM #inference stack just landed. Check out our latest #benchmarks that leave vLLM and Fireworks in the dust. 🏎️💨 Our blog has all the juicy details—but here's the 30-sec version: ⚡ Up to 4× lower P50/P95 latency on the same #H100 & L40S GPUs 📈…

predibase's tweet image. The fastest open-source LLM #inference stack just landed.

Check out our latest #benchmarks that leave vLLM and Fireworks in the dust. 🏎️💨

Our blog has all the juicy details—but here's the 30-sec version:
⚡ Up to 4× lower P50/P95 latency on the same #H100 & L40S GPUs
📈…

You can now filter the LLM benchmark list by size. Here's the top XS models (< 2B parameters) furukama.com/llm-bob/?size=… #benchmarks #artificialintelligence

furukama's tweet image. You can now filter the LLM benchmark list by size. Here&apos;s the top XS models (&amp;lt; 2B parameters) furukama.com/llm-bob/?size=… #benchmarks #artificialintelligence

From paying the highest-ever dividend in #FY24 to winning the #BMMunjalAward, we set new #benchmarks in building a sustainable, resilient India. 🏆 As we bid farewell to #2024, we eagerly embrace the endless possibilities that lie ahead! 🙌 #HUDCOImpact #Throwback2024


GPT-5 vs Grok 4 - SkateBench → GPT-5: 98.6% accuracy | $0.07 cost → Grok 4: 79% accuracy | $4.86 cost GPT-5 is: → 14× cheaper → More accurate → Much faster This is precision at scale. That is burn rate with lag. #GPT5 #LLM #Benchmarks

0xhyke's tweet image. GPT-5 vs Grok 4 - SkateBench

→ GPT-5: 98.6% accuracy | $0.07 cost
→ Grok 4: 79% accuracy | $4.86 cost

GPT-5 is:
→ 14× cheaper
→ More accurate
→ Much faster

This is precision at scale.
That is burn rate with lag.

#GPT5 #LLM #Benchmarks

VICON: Vision In-Context Operator Networks for Multi-Physics Fluid Dynamics Prediction openreview.net/forum?id=6V3Ym… #benchmarks #strides #learning


#Excellence in #Engineering We are commented new #benchmarks in quality & reliability At #MENASCO, we are #Committed to achieving #excellence through #expertise, #innovation & #precision delivering engineering #solutions that set new benchmarks in #quality & #reliability.

menasco_uae's tweet image. #Excellence in #Engineering 
We are commented new #benchmarks in quality &amp;amp; reliability
At #MENASCO, we are #Committed  to achieving #excellence through #expertise, #innovation &amp;amp; #precision delivering engineering #solutions that set new benchmarks in #quality &amp;amp; #reliability.
menasco_uae's tweet image. #Excellence in #Engineering 
We are commented new #benchmarks in quality &amp;amp; reliability
At #MENASCO, we are #Committed  to achieving #excellence through #expertise, #innovation &amp;amp; #precision delivering engineering #solutions that set new benchmarks in #quality &amp;amp; #reliability.

Youre device is too powerful...#S24 ....never seen that before #benchmarks #snapdragon

TheManInBlackZ's tweet image. Youre device is too powerful...#S24 ....never seen that before #benchmarks #snapdragon
TheManInBlackZ's tweet image. Youre device is too powerful...#S24 ....never seen that before #benchmarks #snapdragon
TheManInBlackZ's tweet image. Youre device is too powerful...#S24 ....never seen that before #benchmarks #snapdragon

NVIDIA Blackwell Outshines in InferenceMAX™ v1 Benchmarks NVIDIA's Blackwell architecture demonstrates significant performance and efficiency gains in SemiAnalysis's InferenceMAX™ v1 benchmarks, setting new standa ➤ jmpto.net/pFyef #Benchmarks #Inferencemax #Nvidia


Dextr: Zero-Shot Neural Architecture Search with Singular Value Decomposition and Extrinsic Curva... Rohan Asthana, Joschua Conrad, Maurits Ortmanns, Vasileios Belagiannis. Action editor: Frederic Sala. openreview.net/forum?id=X0vPo… #cnn #benchmarks #netw


GLM-4.6 benchmarks: Grok 4 third in intelligence at 65, Grok Fast fifth! Solid showing vs. GPT-5 top spots. Speed/price balanced. 📊 @xai @ArtificialAnlys #AI #Benchmarks artificialanalysis.ai/models/glm-4-6…

TonyMidtrud's tweet image. GLM-4.6 benchmarks: Grok 4 third in intelligence at 65, Grok Fast fifth! Solid showing vs. GPT-5 top spots. Speed/price balanced. 📊 @xai @ArtificialAnlys #AI #Benchmarks
artificialanalysis.ai/models/glm-4-6…

3️⃣ Kamil Chodoła(@ChodoKamil) provided a deep dive into performance #benchmarks and testing strategies, showcasing the rigorous processes involved in upcoming upgrades. 🔧

ECHInstitute's tweet image. 3️⃣ Kamil Chodoła(@ChodoKamil) provided a deep dive into performance #benchmarks and testing strategies, showcasing the rigorous processes involved in upcoming upgrades. 🔧

Policies and regulations change fast. Are you ready when they do? Benchmarks helps you stay informed, turn policy into action, and make your voice count. Don’t get caught off guard—lead with confidence. #BecomeAMemeber #Benchmarks

BenchmarksNC's tweet image. Policies and regulations change fast. Are you ready when they do? Benchmarks helps you stay informed, turn policy into action, and make your voice count. Don’t get caught off guard—lead with confidence. #BecomeAMemeber #Benchmarks

11/ What we still need: rigorous benchmarks, domain-specific safety models, continuous behavioral audits, and real oversight rails. Speed without stewardship isn’t progress. #Benchmarks #Standards #AICompliance


2. Verifiers: This method lets the LLM give a free-form answer (like in math or code), and a tool checks if the final result is correct. It's a step up from multiple-choice but only works for problems with a clear right or wrong answer. #MachineLearning #Benchmarks

WaghHimanshu's tweet image. 2. Verifiers: This method lets the LLM give a free-form answer (like in math or code), and a tool checks if the final result is correct. It&apos;s a step up from multiple-choice but only works for problems with a clear right or wrong answer. #MachineLearning #Benchmarks

Claude Sonnet 4.5 just topped SWE-bench Verified (n=500) with 82% accuracy — outperforming Opus 4.1, Sonnet 4, GPT-5 Codex, GPT-5, and Gemini 2.5 Pro. Software engineering benchmark results are clear: Sonnet 4.5 leads. #AI #SoftwareEngineering #Benchmarks #Craftvideo

AgenticLabsLtd's tweet image. Claude Sonnet 4.5 just topped SWE-bench Verified (n=500) with 82% accuracy — outperforming Opus 4.1, Sonnet 4, GPT-5 Codex, GPT-5, and Gemini 2.5 Pro.

Software engineering benchmark results are clear: 
Sonnet 4.5 leads.

#AI #SoftwareEngineering #Benchmarks #Craftvideo

#DiabloIV #Benchmarks - 38 GPUs tested✅(yesterday) - 14 CPUs tested✅(today) Enjoy! computerbase.de/2023-06/diablo…

ComputerBase's tweet image. #DiabloIV #Benchmarks

- 38 GPUs tested✅(yesterday)
- 14 CPUs tested✅(today)

Enjoy!

computerbase.de/2023-06/diablo…

✨ Taking the stage, @Jared_Spataro here to share his thoughts and insights about “A New Frontier: Building the Future Firm with #AI#BenchMarks 📈📉📊 #M365Con Day Three Keynote

MSFTAdoption's tweet image. ✨ Taking the stage, @Jared_Spataro here to share his thoughts and insights about “A New Frontier: Building the Future Firm with #AI” #BenchMarks 📈📉📊

#M365Con Day Three Keynote
MSFTAdoption's tweet image. ✨ Taking the stage, @Jared_Spataro here to share his thoughts and insights about “A New Frontier: Building the Future Firm with #AI” #BenchMarks 📈📉📊

#M365Con Day Three Keynote
MSFTAdoption's tweet image. ✨ Taking the stage, @Jared_Spataro here to share his thoughts and insights about “A New Frontier: Building the Future Firm with #AI” #BenchMarks 📈📉📊

#M365Con Day Three Keynote

鯖缶のぼやき 今流行り?のベンチマークを動かしてみた #MonsterHunterWilds #Benchmarks

yanoyano4649's tweet image. 鯖缶のぼやき
今流行り?のベンチマークを動かしてみた

#MonsterHunterWilds
#Benchmarks

#benchmarks Places of worship across the island of Ireland bear a tangible link to the legacy of the Ordnance Survey which mapped Ireland nearly 200 years ago. The OS was the completion of the world’s first large scale mapping of an entire country.

Heritage_Nina's tweet image. #benchmarks 
Places of worship across the island of Ireland bear a tangible link to the legacy of the Ordnance Survey which mapped Ireland nearly 200 years ago. The OS was the completion of the world’s first large scale mapping of an entire country.
Heritage_Nina's tweet image. #benchmarks 
Places of worship across the island of Ireland bear a tangible link to the legacy of the Ordnance Survey which mapped Ireland nearly 200 years ago. The OS was the completion of the world’s first large scale mapping of an entire country.
Heritage_Nina's tweet image. #benchmarks 
Places of worship across the island of Ireland bear a tangible link to the legacy of the Ordnance Survey which mapped Ireland nearly 200 years ago. The OS was the completion of the world’s first large scale mapping of an entire country.
Heritage_Nina's tweet image. #benchmarks 
Places of worship across the island of Ireland bear a tangible link to the legacy of the Ordnance Survey which mapped Ireland nearly 200 years ago. The OS was the completion of the world’s first large scale mapping of an entire country.

Do you know how Google PaLM2 model powering Bard compares to other LLMs? 🤔 Tomorrow at GitHub SF I will compare publicly available benchmarks for PaLM2, GPT-4, GPT-3.5 and Llama2 representing open source! RSVP now! Last seats 👉🏻 meetup.com/graphql-sf/eve… #ai #benchmarks ✨🚀

gerardsans's tweet image. Do you know how Google PaLM2 model powering Bard compares to other LLMs? 🤔

Tomorrow at GitHub SF I will compare publicly available benchmarks for PaLM2, GPT-4, GPT-3.5 and Llama2 representing open source!

RSVP now! Last seats 👉🏻
meetup.com/graphql-sf/eve…

#ai #benchmarks ✨🚀

We just released our evaluation of @MistralAI Medium 3 across all of our benchmarks! 🧵(1/6) #AI #LLM #Benchmarks

_valsai's tweet image. We just released our evaluation of @MistralAI Medium 3 across all of our benchmarks! 🧵(1/6)
 #AI #LLM #Benchmarks

📢 Excited to share that COBIAS has been accepted at #WebSci25! 🎉 Our work aims to quantify the contextual #quality of LLM-bias #benchmarks. w/ @priyanshul1202 @jain_hemang112 @VictorKnox99 @i_amanchadha @manasgaur90 @DeySanorita 📜 arxiv.org/abs/2402.14889 Findings 🧵⬇️

ponguru's tweet image. 📢 Excited to share that COBIAS has been accepted at #WebSci25! 🎉 Our work aims to quantify the contextual #quality of LLM-bias #benchmarks. w/ @priyanshul1202 @jain_hemang112 @VictorKnox99 @i_amanchadha @manasgaur90 @DeySanorita

📜 arxiv.org/abs/2402.14889

Findings 🧵⬇️

The TikTok ban grace period expires this week. A new study shows Meta ad prices soared during the previous brief TikTok outage – hurting small businesses the most. go.tigerpistol.com/3R2xihZ #FranchiseMarketing #TikTok #Benchmarks #LocalAdvertising #DigitalMarketing

TigerPistol's tweet image. The TikTok ban grace period expires this week. A new study shows Meta ad prices soared during the previous brief TikTok outage – hurting small businesses the most. go.tigerpistol.com/3R2xihZ

#FranchiseMarketing #TikTok #Benchmarks #LocalAdvertising #DigitalMarketing

Updated - Futuremark SystemInfo is a #freeware utility used to identify the #hardware in your system and is used for many of Futuremark's #benchmarks. majorgeeks.com/files/details/…

majorgeeks's tweet image. Updated - Futuremark SystemInfo is a #freeware utility used to identify the #hardware in your system and is used for many of Futuremark&apos;s #benchmarks.
majorgeeks.com/files/details/…

It's important to use proper #benchmarks and #evaluation methods to validate your #models, especially for time series

PyLadiesParis's tweet image. It&apos;s important to use proper #benchmarks and #evaluation methods to validate your #models, especially for time series

The fastest open-source LLM #inference stack just landed. Check out our latest #benchmarks that leave vLLM and Fireworks in the dust. 🏎️💨 Our blog has all the juicy details—but here's the 30-sec version: ⚡ Up to 4× lower P50/P95 latency on the same #H100 & L40S GPUs 📈…

predibase's tweet image. The fastest open-source LLM #inference stack just landed.

Check out our latest #benchmarks that leave vLLM and Fireworks in the dust. 🏎️💨

Our blog has all the juicy details—but here&apos;s the 30-sec version:
⚡ Up to 4× lower P50/P95 latency on the same #H100 &amp;amp; L40S GPUs
📈…

3️⃣ Kamil Chodoła(@ChodoKamil) provided a deep dive into performance #benchmarks and testing strategies, showcasing the rigorous processes involved in upcoming upgrades. 🔧

ECHInstitute's tweet image. 3️⃣ Kamil Chodoła(@ChodoKamil) provided a deep dive into performance #benchmarks and testing strategies, showcasing the rigorous processes involved in upcoming upgrades. 🔧

Updated - #Futuremark SystemInfo is a #freeware utility used to identify the hardware in your system and is used for many of Futuremark's #benchmarks. majorgeeks.com/files/details/…

majorgeeks's tweet image. Updated - #Futuremark SystemInfo is a #freeware utility used to identify the hardware in your system and is used for many of Futuremark&apos;s #benchmarks.
majorgeeks.com/files/details/…

GPT-5 vs Grok 4 - SkateBench → GPT-5: 98.6% accuracy | $0.07 cost → Grok 4: 79% accuracy | $4.86 cost GPT-5 is: → 14× cheaper → More accurate → Much faster This is precision at scale. That is burn rate with lag. #GPT5 #LLM #Benchmarks

0xhyke's tweet image. GPT-5 vs Grok 4 - SkateBench

→ GPT-5: 98.6% accuracy | $0.07 cost
→ Grok 4: 79% accuracy | $4.86 cost

GPT-5 is:
→ 14× cheaper
→ More accurate
→ Much faster

This is precision at scale.
That is burn rate with lag.

#GPT5 #LLM #Benchmarks

CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibili... Zachary S Siegel, Sayash Kapoor, Nitya Nadgir, Benedikt Stroebl, Arvind Narayanan tmlr.infinite-conf.org/paper_pages/Bs… #benchmark #benchmarks #ai

TmlrVideos's tweet image. CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibili...

Zachary S Siegel, Sayash Kapoor, Nitya Nadgir, Benedikt Stroebl, Arvind Narayanan

tmlr.infinite-conf.org/paper_pages/Bs…

#benchmark #benchmarks #ai

Loading...

Something went wrong.


Something went wrong.


United States Trends