_fabknowledge_'s profile picture. Simplifying the world of semiconductor investing in the age of AI. Part of the @semianalysis_ gang.

Fabricated Knowledge

@_fabknowledge_

Simplifying the world of semiconductor investing in the age of AI. Part of the @semianalysis_ gang.

Fabricated Knowledge 님이 재게시함

On GPT-OSS 120B documentation summarization scenario, MI355X vLLM is seeing competitive perf per TCO compared to B200 vLLM for below 210 tok/s/user interactivity. For above 210 tok/s/user, we are seeing B200 vLLM & B200 trtllm having an advantage on the current software. There is…

SemiAnalysis_'s tweet image. On GPT-OSS 120B documentation summarization scenario, MI355X vLLM is seeing competitive perf per TCO compared to B200 vLLM for below 210 tok/s/user interactivity. For above 210 tok/s/user, we are seeing B200 vLLM & B200 trtllm having an advantage on the current software. There is…

Markets never bottom on a Friday lmao


Fabricated Knowledge 님이 재게시함

what GB200 NVL72 does to a mfer

scaling01's tweet image. what GB200 NVL72 does to a mfer

All results and such can be accessed at inferencemax.ai And the code and everything is open sourced here github.com/InferenceMAX/I… Methodology and explanation of results are here newsletter.semianalysis.com/p/inferencemax…



The fed should buy Hyperscaler LT maturities for easing LMAO


Fabricated Knowledge 님이 재게시함

the results are interesting to review, showing a pareto frontier between throughput and e2e latency, or throughput and interactivity (tok/sec per user) moving to the pareto frontier means serve more users or delivering faster responses with the same infrastructure

JordanNanos's tweet image. the results are interesting to review, showing a pareto frontier between throughput and e2e latency, or throughput and interactivity (tok/sec per user)

moving to the pareto frontier means serve more users or delivering faster responses with the same infrastructure

Fabricated Knowledge 님이 재게시함

the industry needs an open-source, automated inference benchmark that moves at the same speed as the AI software ecosystem: inferencemax.ai

JordanNanos's tweet image. the industry needs an open-source, automated inference benchmark that moves at the same speed as the AI software ecosystem: inferencemax.ai

There are paper specs and real specs. Today’s the first day we see real world performance at scale! Excited to see this evolve overtime!

InferenceMAX™: Open Source Inference Benchmarking Support from OpenAI, @LisaSu , @AnushElangovan , @ia_buck ,@tri_dao, and many more. NVIDIA GB200 NVL72, AMD MI355X Throughput Token per GPU, Latency Tok/s/user Perf per Dollar, Cost per Million Tokens, Tokens per Provisioned…

SemiAnalysis_'s tweet image. InferenceMAX™: Open Source Inference Benchmarking
Support from OpenAI, @LisaSu , @AnushElangovan , @ia_buck ,@tri_dao, and many more.
NVIDIA GB200 NVL72, AMD MI355X
Throughput Token per GPU, Latency Tok/s/user
Perf per Dollar, Cost per Million Tokens, Tokens per Provisioned…


One of the random viz I wish I could see is when I’m on the subway, I wish I could see the explosion of RF as a train goes to a new station. I can’t imagine what it looks like, it probably is pure chaos, and I legit wish I could see RF to witness


Fabricated Knowledge 님이 재게시함

📣 NVIDIA Blackwell sets the standard for AI inference on SemiAnalysis InferenceMAX. Our most recent results on the independent benchmarks show NVIDIA’s Blackwell Platform leads AI factory ROI—— see how NVIDIA Blackwell GB200 NVL72 can yield $75 million in token revenue over…

nvidia's tweet image. 📣 NVIDIA Blackwell sets the standard for AI inference on SemiAnalysis InferenceMAX.

Our most recent results on the independent benchmarks show NVIDIA’s Blackwell Platform leads AI factory ROI—— see how NVIDIA Blackwell GB200 NVL72 can yield $75 million in token revenue over…

I really do not think people appreciate what this is: there has never been a source of truth for GPU throughout. Specs on paper have never meant anything. This is IT !!!

All results and such can be accessed at inferencemax.ai And the code and everything is open sourced here github.com/InferenceMAX/I… Methodology and explanation of results are here newsletter.semianalysis.com/p/inferencemax…



Fabricated Knowledge 님이 재게시함
tenderizzation's tweet image.

Fabricated Knowledge 님이 재게시함

Today we are launching InferenceMAX! We have support from Nvidia, AMD, OpenAI, Microsoft, Pytorch, SGLang, vLLM, Oracle, CoreWeave, TogetherAI, Nebius, Crusoe, HPE, SuperMicro, Dell It runs every day on the latest software (vLLM, SGLang, etc) across hundreds of GPUs, $10Ms of…

Going to be dropping something huge in 24 hours I think it'll reshape how everyone thinks about chips, inference, and infrastructure It's directly supported by NVIDIA, AMD, Microsoft, OpenAI, Together AI, CoreWeave, Nebius, PyTorch Foundation, Supermicro, Crusoe, HPE, Tensorwave,…



SemiAnalysis is back on Substack! open.substack.com/pub/semianalys… And we are coming with the biggest piece we’ve done in the AI space: welcome to InferenceMax. If you want to know what AMD does versus NVDA? Here’s the answer


INFERENCE MAXXXXX!!!


Fabricated Knowledge 님이 재게시함

We estimate that Claude Sonnet 4.5 has a 50%-time-horizon of around 1 hr 53 min (95% confidence interval of 50 to 235 minutes) on our agentic multi-step software engineering tasks. This estimate is lower than the current highest time-horizon point estimate of around 2 hr 15 min.

METR_Evals's tweet image. We estimate that Claude Sonnet 4.5 has a 50%-time-horizon of around 1 hr 53 min (95% confidence interval of 50 to 235 minutes) on our agentic multi-step software engineering tasks. This estimate is lower than the current highest time-horizon point estimate of around 2 hr 15 min.

Fabricated Knowledge 님이 재게시함

China’s State Council on October 9 approved Order No. 61 of 2025, announcing export controls on certain overseas rare-earth items. This marks the fourth round of rare-earth export restriction efforts; the previous round was on April 8. (1/8)🧵

SemiAnalysis_'s tweet image. China’s State Council on October 9 approved Order No. 61 of 2025, announcing export controls on certain overseas rare-earth items. This marks the fourth round of rare-earth export restriction efforts; the previous round was on April 8.
(1/8)🧵

Fabricated Knowledge 님이 재게시함

I don't know what labs are doing to these poor LLMs during RL but they are mortally terrified of exceptions, in any infinitesimally likely case. Exceptions are a normal part of life and healthy dev process. Sign my LLM welfare petition for improved rewards in cases of exceptions.


I'm hoping this goes off today smooth without a hitch just like our transition back to substack did That being said SOOOOOON TM

Going to be dropping something huge in 24 hours I think it'll reshape how everyone thinks about chips, inference, and infrastructure It's directly supported by NVIDIA, AMD, Microsoft, OpenAI, Together AI, CoreWeave, Nebius, PyTorch Foundation, Supermicro, Crusoe, HPE, Tensorwave,…



There’s another way to think about it. If 1 product is a 2nm and the other is a 3nm yet one has better performance, they call the difference “margin”

🚨Lisa Su dropped a bombshell Yet nobody has caught it $AMD's MI450 will use 2nm technology, while $NVDA's Rubin will use 3nm A massive power and efficiency advantage This is breaking news, and I don’t understand why nobody is reporting it



Loading...

Something went wrong.


Something went wrong.