#llminference search results

Shaded Lenz

@shaded_lenz

Oct 30

Well this is complicated.. #GPUCluster #AIAgent #LLMInference

Llama3.1-70B now runs at 560 tokens/s @CerebrasSystems is the fastest AI inference and training platform in the world. Llama3.1-70B now runs at 560 tokens/s on Cerebras Inference API. #llama3 #llminference #llms #nlproc #deeplearning

Kartikey Rawat

@carrycooldude

Jun 28, 2024

On device ML using Gemma with WebGPU #LLMInference It's working finally 🫣

QCT

@QuantaQCT

Nov 3

With support for a variety of PCIe GPUs and configurations, #QuantaGrid D75E-4U is ideal for enterprises seeking to fulfill #GenAI, #LLMInference & #ComputerVision workloads with a cost-performance balanced AI cluster. Visit @QuantaQCT at #SC25 to learn more. #SuperComputing2025

QuantaQCT's tweet image. With support for a variety of PCIe GPUs and configurations, #QuantaGrid D75E-4U is ideal for enterprises seeking to fulfill #GenAI, #LLMInference &amp; #ComputerVision workloads with a cost-performance balanced AI cluster.
Visit @QuantaQCT at #SC25 to learn more. #SuperComputing2025

Sujee Maniyam

@sujee_dev

Aug 8

Nebius-Fast ~400 t/s 🚀 artificialanalysis.ai/models/deepsee… #DeepSeekR1 #LLMinference

Nebius Token Factory

@nebiustf

Aug 8

Speed isn’t luck, it’s good engineering. Nebius AI Studio is the fastest place to run DeepSeek R1 0528 (per @ArtificialAnlys) Blackwell B200 + speculative decoding + smart FP4 on our vertically integrated stack. Sub-second first token. Real throughput. 👇

nebiustf's tweet image. Speed isn’t luck, it’s good engineering.

Nebius AI Studio is the fastest place to run DeepSeek R1 0528 (per @ArtificialAnlys)

Blackwell B200 + speculative decoding + smart FP4 on our vertically integrated stack.

Sub-second first token. Real throughput.

👇

Robert Schwentker

@schwentker

Nov 27

4/5 ⚙️ Cold Start Problem in AI Inference: @charles_irl explains: Serverless = great for bursty use cases, but cold starts add latency. Modal’s stack minimizes cold start times—ideal for production AI. #LLMInference #AIOptimization

schwentker's tweet image. 4/5
⚙️ Cold Start Problem in AI Inference:
@charles_irl explains:

Serverless = great for bursty use cases, but cold starts add latency.

Modal’s stack minimizes cold start times—ideal for production AI.
#LLMInference #AIOptimization

Bento - Run Inference at Scale

@bentomlai

Jun 11

We’re entering a new era for #LLMInference: moving beyond single-node optimizations to distributed serving strategies that unlock better performance, smarter resource utilization, and real cost savings. In this blog post, we break down the latest techniques for distributed LLM…

bentomlai's tweet image. We’re entering a new era for #LLMInference: moving beyond single-node optimizations to distributed serving strategies that unlock better performance, smarter resource utilization, and real cost savings.

In this blog post, we break down the latest techniques for distributed LLM…

0xgrey ⧉ (❖,❖)

@Gr3yscope

Oct 5

Blessing setup a training models blended 3:1!! Model: llama-3.2-1b vs llama-3.2-3b (1) llama-3.2-1b = costs $0.12 per 1M tokens (2) llama-3.2-3b = costs $0.3 per 1M tokens meta-llama/llama-3.2-3b-instruct/fp-16-fast-ollama-1 This worth or not? #llminference #llama #ai #swarm

Gr3yscope's tweet image. Blessing setup a training models blended 3:1!!

Model: llama-3.2-1b vs llama-3.2-3b

(1) llama-3.2-1b = costs $0.12 per 1M tokens
(2) llama-3.2-3b = costs $0.3 per 1M tokens

meta-llama/llama-3.2-3b-instruct/fp-16-fast-ollama-1

This worth or not? #llminference #llama #ai #swarm

Pliops

@PliopsLtd

Oct 15, 2024

This week is the #OCPGlobalSummit & we're there to discuss #LLMinference acceleration! Demos show >2X performance over vLLM, plus: 80-90% higher efficiency 50% TCO improvements 50% lower DC cooling costs 50% less CO2 emissions Reach us at [email protected] if you'd like to meet.

Pliops

@PliopsLtd

Jul 29, 2024

Join us next week at FMS to see how we’re tackling data challenges for #GenAI Apps w/ a preview of our upcoming product, XDP LightningAI, which enhances AI #LLMInference efficiency by 80-90%! Contact us at [email protected] to schedule a time or visit our booth 1045. See you there!

Pliops

@PliopsLtd

Jul 18, 2024

#FMS2024 (now the Future of Memory and Storage) is coming up next month … and we'll be there! Many great things in store – including a preview of our forthcoming product that enhances AI #LLMinference efficiency by 80-90%, making it the fastest inference solution on the market.

PliopsLtd's tweet image. #FMS2024 (now the Future of Memory and Storage) is coming up next month … and we'll be there! Many great things in store – including a preview of our forthcoming product that enhances AI #LLMinference efficiency by 80-90%, making it the fastest inference solution on the market.

Bento - Run Inference at Scale

@bentomlai

Aug 19

Everyone’s talking about speculative decoding for faster #LLMinference. But do you know: the actual speedup 𝗱𝗲𝗽𝗲𝗻𝗱𝘀 𝗵𝗲𝗮𝘃𝗶𝗹𝘆 𝗼𝗻 𝘁𝗵𝗲 𝗗𝗥𝗔𝗙𝗧 𝗺𝗼𝗱𝗲𝗹 𝘆𝗼𝘂 𝘂𝘀𝗲. Three metrics can be used to evaluate the performance of the draft model: - Acceptance rate…

bentomlai's tweet image. Everyone’s talking about speculative decoding for faster #LLMinference. But do you know: the actual speedup 𝗱𝗲𝗽𝗲𝗻𝗱𝘀 𝗵𝗲𝗮𝘃𝗶𝗹𝘆 𝗼𝗻 𝘁𝗵𝗲 𝗗𝗥𝗔𝗙𝗧 𝗺𝗼𝗱𝗲𝗹 𝘆𝗼𝘂 𝘂𝘀𝗲.

Three metrics can be used to evaluate the performance of the draft model:
- Acceptance rate…

Himanshu Bamoria

@0xhbam

Sep 26, 2024

🚀 "Want to ensure your LLM-powered application runs smoothly in real-time? This guide covers everything from hardware choices to software optimizations that can make a huge difference. buff.ly/3zojOrJ #LLMinference #AI"

Bento - Run Inference at Scale

@bentomlai

Oct 23

Tired of hitting your ChatGPT limit? bentoml.com/blog/chatgpt-u… Even paid users face hidden message caps, downgrades, and throttling. Here’s why those limits exist and how to remove them. Read the full guide 👇 #ChatGPT #LLMInference #OpenSourceAI #BentoML

bentomlai's tweet card. Learn ChatGPT usage limits for Free, Plus, Business, and Pro plans (2025 update). Understand why they exist and how to remove them with self-hosted LLMs.

ChatGPT Usage Limits: What They Are and How to Get Rid of Them

Source: bentoml.com

Bunty Shah

@buntys2010

Oct 7

🎯🌱 • If your evals wander across runs, check batch invariance before hunting seeds. #LLMInference #AIEngineering #Reproducibility #AIArchitecture #GenerativeAI

Vlad Ruso PhD

@vlruso

Nov 4, 2024

ShadowKV: A High-Throughput Inference System for Long-Context LLM Inference itinai.com/shadowkv-a-hig… #ShadowKV #LLMInference #AIOptimization #DataEfficiency #MachineLearning #ai #news #llm #ml #research #ainews #innovation #artificialintelligence #machinelearning #technology #…

vlruso's tweet image. ShadowKV: A High-Throughput Inference System for Long-Context LLM Inference

itinai.com/shadowkv-a-hig…

#ShadowKV #LLMInference #AIOptimization #DataEfficiency #MachineLearning #ai #news #llm #ml #research #ainews #innovation #artificialintelligence #machinelearning #technology #…

DoMS.IITKanpur

@DoMS_IITKanpur

Aug 7

We extend our sincere thanks to Prof. Sengupta, for an engaging and enriching session!! #ResearchSeminar #LLMInference #DoMSResearch #Academia #KnowledgeSharing #IITK @IITKanpur

DoMS_IITKanpur's tweet image. We extend our sincere thanks to Prof. Sengupta, for an engaging and enriching session!!

#ResearchSeminar #LLMInference #DoMSResearch #Academia #KnowledgeSharing #IITK @IITKanpur

Abhinav jangra

@abhii_298

Feb 21

So if you're loading all those weights anyway, might as well process multiple tokens at once! Draft model predictions often correct for "easy" tokens, so we skip ahead. Can't believe I'm so outdated tf... Karpathy been explaining this since 2023! #AIOptimization #LLMInference

govindhtech

@TechGovind70399

Jul 23, 2024

Using LLM Inference, Apple’s Generative AI Improvements Read more on govindhtech.com/using-llm-infe… #llminference #llminference #LargeLanguageModels #AITechnology #AIApplications #AISystems #aimodel #lazyllm #applegenai #CuttingEdgeTechnology #virtualassistants #GenAI #healthcare #News…

TechGovind70399's tweet image. Using LLM Inference, Apple’s Generative AI Improvements
Read more on govindhtech.com/using-llm-infe…
#llminference #llminference #LargeLanguageModels #AITechnology #AIApplications #AISystems #aimodel #lazyllm #applegenai #CuttingEdgeTechnology #virtualassistants #GenAI #healthcare #News…

QCT

@QuantaQCT

Nov 3

Shaded Lenz

@shaded_lenz

Oct 30

Well this is complicated.. #GPUCluster #AIAgent #LLMInference

Bento - Run Inference at Scale

@bentomlai

Oct 23

ChatGPT Usage Limits: What They Are and How to Get Rid of Them

Source: bentoml.com

Bunty Shah

@buntys2010

Oct 7

🎯🌱 • If your evals wander across runs, check batch invariance before hunting seeds. #LLMInference #AIEngineering #Reproducibility #AIArchitecture #GenerativeAI

0xgrey ⧉ (❖,❖)

@Gr3yscope

Oct 5

HackerNoon | Learn Any Technology

@hackernoon

Aug 26

This analysis breaks down on-device LLM inference challenges, from compute stages to the unique performance quirks of smartphone storage. - hackernoon.com/why-your-phone… #ondeviceai #llminference

hackernoon's tweet card. This analysis breaks down on-device LLM inference challenges, from compute stages to the unique performance quirks of smartphone storage.

Why Your Phone's AI is Slow: A Story of Sparse Neurons and Finicky Flash Storage | HackerNoon

Source: hackernoon.com

Vlad Ruso PhD

@vlruso

Aug 26

Boost Your LLM Performance: How Stanford’s Optimistic Algorithm Cuts Latency by 5x #LLMInference #AminAlgorithm #ArtificialIntelligence #LatencyReduction #AIOptimization itinai.com/boost-your-llm… The Hidden Bottleneck in LLM Inference In the rapidly evolving landscape of artifi…

vlruso's tweet image. Boost Your LLM Performance: How Stanford’s Optimistic Algorithm Cuts Latency by 5x #LLMInference #AminAlgorithm #ArtificialIntelligence #LatencyReduction #AIOptimization
itinai.com/boost-your-llm…

The Hidden Bottleneck in LLM Inference

In the rapidly evolving landscape of artifi…

Bento - Run Inference at Scale

@bentomlai

Aug 19

Sujee Maniyam

@sujee_dev

Aug 8

Nebius-Fast ~400 t/s 🚀 artificialanalysis.ai/models/deepsee… #DeepSeekR1 #LLMinference

Nebius Token Factory

@nebiustf

Aug 8

DoMS.IITKanpur

@DoMS_IITKanpur

Aug 7

We extend our sincere thanks to Prof. Sengupta, for an engaging and enriching session!! #ResearchSeminar #LLMInference #DoMSResearch #Academia #KnowledgeSharing #IITK @IITKanpur

Nick Dodd

@nickdodd

Jun 30

#LLMsoftware #LLMInference #AIModels #serversoftware #opensource

[email protected]

@llminference

Shaded Lenz

@shaded_lenz

Oct 30

Well this is complicated.. #GPUCluster #AIAgent #LLMInference

Sujee Maniyam

@sujee_dev

Aug 8

Nebius-Fast ~400 t/s 🚀 artificialanalysis.ai/models/deepsee… #DeepSeekR1 #LLMinference

Nebius Token Factory

@nebiustf

Aug 8

Bento - Run Inference at Scale

@bentomlai

Jun 11

Pliops

@PliopsLtd

Oct 15, 2024

Pliops

@PliopsLtd

Jul 29, 2024

Pliops

@PliopsLtd

Jul 18, 2024

Kartikey Rawat

@carrycooldude

Jun 28, 2024

On device ML using Gemma with WebGPU #LLMInference It's working finally 🫣

Bento - Run Inference at Scale

@bentomlai

Aug 19

Pliops

@PliopsLtd

Aug 7, 2024

Pliops is pleased to announce that our XDP LightningAI has received a “Best of Show” Award from FMS in the #LLMInference Acceleration Category! Additionally, the newest iteration of our Extreme Data Processor, the XDP PRO, is our first ASIC-based chip. bit.ly/3LVftyD

PliopsLtd's tweet image. Pliops is pleased to announce that our XDP LightningAI has received a “Best of Show” Award from FMS in the #LLMInference Acceleration Category! Additionally, the newest iteration of our Extreme Data Processor, the XDP PRO, is our first ASIC-based chip. bit.ly/3LVftyD

QCT

@QuantaQCT

Nov 3

Pliops

@PliopsLtd

Jul 10, 2024

#GenerativeAI is gaining traction. However, deploying AI, especially #LLMinference, needs substantial #GPU and memory resources, increasing power usage & emissions. Our Extreme Data Processor uses GPU key-value I/O to cut power usage & emissions.

PliopsLtd's tweet image. #GenerativeAI is gaining traction. However, deploying AI, especially #LLMinference, needs substantial #GPU and memory resources, increasing power usage &amp; emissions. Our Extreme Data Processor uses GPU key-value I/O to cut power usage &amp; emissions.

DoMS.IITKanpur

@DoMS_IITKanpur

Aug 7

We extend our sincere thanks to Prof. Sengupta, for an engaging and enriching session!! #ResearchSeminar #LLMInference #DoMSResearch #Academia #KnowledgeSharing #IITK @IITKanpur

Vlad Ruso PhD

@vlruso

Nov 4, 2024

Pliops

@PliopsLtd

Nov 20

Check out this video narrated by our chief AI Scientist Eshcar Hillel, in which she explains how our XDP LightningAI solution accelerates LLM inferencing to drive data center efficiency. Watch here: bit.ly/3YX4b31 #SC24 #LLMInference #DataCenterEfficiency #GenAI

PliopsLtd's tweet image. Check out this video narrated by our chief AI Scientist Eshcar Hillel, in which she explains how our XDP LightningAI solution accelerates LLM inferencing to drive data center efficiency. Watch here: bit.ly/3YX4b31

#SC24 #LLMInference #DataCenterEfficiency #GenAI

Robert Schwentker

@schwentker

Nov 27

govindhtech

@TechGovind70399

Jul 23, 2024

0xgrey ⧉ (❖,❖)

@Gr3yscope

Oct 5

govindhtech

@TechGovind70399

Feb 17

What Is LLM Inference? Batch Inference In LLM Inference Read more on govindhtech.com/what-is-llm-in… #AI #LLM #LLMInference #largelanguagemodels #inferenceLLM #BERT #GPT #artificialntelligence #News #Technews #Technology #Technologynews #Technologytrend #Govindhtech @TechGovind70399

TechGovind70399's tweet image. What Is LLM Inference? Batch Inference In LLM Inference
Read more on govindhtech.com/what-is-llm-in…
#AI #LLM #LLMInference #largelanguagemodels #inferenceLLM #BERT #GPT #artificialntelligence #News #Technews #Technology #Technologynews #Technologytrend #Govindhtech @TechGovind70399

Vlad Ruso PhD

@vlruso

Aug 26

Pliops

@PliopsLtd

Sep 16, 2024

Looking for game-changing efficiency for #LLMinference? Check out XDP LightningAI! This #KeyValue distributed #smartstorage node boosts LLM inference efficiency by 80-90%, cuts TCO by 50%, reduces CO2 emissions by 50%, & slashes DC cooling costs by 50%. bit.ly/3zmW0Va

PliopsLtd's tweet image. Looking for game-changing efficiency for #LLMinference? Check out XDP LightningAI! This #KeyValue distributed #smartstorage node boosts LLM inference efficiency by 80-90%, cuts TCO by 50%, reduces CO2 emissions by 50%, &amp; slashes DC cooling costs by 50%. bit.ly/3zmW0Va

Something went wrong.

United States Trends

1. Doran 64.2K posts
2. #Worlds2025 105K posts
3. Good Sunday 59.6K posts
4. Faker 78.3K posts
5. #T1WIN 54.2K posts
6. #sundayvibes 4,330 posts
7. Guma 15K posts
8. Silver Scrapes 4,264 posts
9. #sundaymotivation 1,549 posts
10. O God 7,845 posts
11. Max B 1,488 posts
12. #T1fighting 5,446 posts
13. Oner 22.3K posts
14. Keria 26.5K posts
15. Pence 20.7K posts
16. Jeanna N/A
17. Option 2 4,661 posts
18. Blockchain 197K posts
19. Faye 59.6K posts
20. OutKast 26.1K posts

#llminference search results

Shaded Lenz

Kalyan KS

Kartikey Rawat

QCT

Sujee Maniyam

Nebius Token Factory

Robert Schwentker

Bento - Run Inference at Scale

0xgrey ⧉ (❖,❖)

Pliops

Pliops

Pliops

Bento - Run Inference at Scale

Himanshu Bamoria

Bento - Run Inference at Scale

Bunty Shah

Vlad Ruso PhD

DoMS.IITKanpur

Abhinav jangra

govindhtech

QCT

Shaded Lenz

Bento - Run Inference at Scale

Bunty Shah

0xgrey ⧉ (❖,❖)

HackerNoon | Learn Any Technology

Vlad Ruso PhD

Bento - Run Inference at Scale

Sujee Maniyam

Nebius Token Factory

DoMS.IITKanpur

Nick Dodd

[email protected]

Shaded Lenz

Sujee Maniyam

Nebius Token Factory

Bento - Run Inference at Scale

Pliops

Pliops

Pliops

Kartikey Rawat

Bento - Run Inference at Scale

Pliops

QCT

Pliops

DoMS.IITKanpur

Vlad Ruso PhD

Pliops

Robert Schwentker

govindhtech

0xgrey ⧉ (❖,❖)

govindhtech

Vlad Ruso PhD

Pliops

United States Trends