Fabricated Knowledge

@_fabknowledge_

Simplifying the world of semiconductor investing in the age of AI. Part of the @semianalysis_ gang.

fabricatedknowledge.com

9월 2021에 가입

11K게시물 25K팔로워 736팔로우 중

내가 좋아할 만한 콘텐츠

@dylan522p

@dnystedt

@West4thCapital

@DylanOnChips

@HaydenCapital

@NonGaap

@DrFrederickChen

@ChipsandCheese9

@_inpractise

@bradsling

@WTCM3

@LongHillRoadCap

@SKundojjala

@lfg_cap

@TSOH_Investing

Fabricated Knowledge 님이 재게시함

SemiAnalysis

@SemiAnalysis_

. 10. 10.

On GPT-OSS 120B documentation summarization scenario, MI355X vLLM is seeing competitive perf per TCO compared to B200 vLLM for below 210 tok/s/user interactivity. For above 210 tok/s/user, we are seeing B200 vLLM & B200 trtllm having an advantage on the current software. There is…

SemiAnalysis_'s tweet image. On GPT-OSS 120B documentation summarization scenario, MI355X vLLM is seeing competitive perf per TCO compared to B200 vLLM for below 210 tok/s/user interactivity. For above 210 tok/s/user, we are seeing B200 vLLM &amp; B200 trtllm having an advantage on the current software. There is…

Fabricated Knowledge

@_fabknowledge_

. 10. 10.

Markets never bottom on a Friday lmao

Fabricated Knowledge 님이 재게시함

Lisan al Gaib

@scaling01

. 10. 9.

what GB200 NVL72 does to a mfer

Dylan Patel

@dylan522p

. 10. 9.

All results and such can be accessed at inferencemax.ai And the code and everything is open sourced here github.com/InferenceMAX/I… Methodology and explanation of results are here newsletter.semianalysis.com/p/inferencemax…

dylan522p's tweet card. NVIDIA GB200 NVL72, AMD MI355X, Throughput Token per GPU, Latency Tok/s/user, Perf per Dollar, Tokens per Provisioned Megawatt, DeepSeek R1 670B, GPTOSS 120B, Llama3 70B

InferenceMAX™: Open Source Inference Benchmarking

출처: newsletter.semianalysis.com

Fabricated Knowledge

@_fabknowledge_

. 10. 10.

The fed should buy Hyperscaler LT maturities for easing LMAO

Fabricated Knowledge 님이 재게시함

Jordan Nanos

@JordanNanos

. 10. 9.

the results are interesting to review, showing a pareto frontier between throughput and e2e latency, or throughput and interactivity (tok/sec per user) moving to the pareto frontier means serve more users or delivering faster responses with the same infrastructure

JordanNanos's tweet image. the results are interesting to review, showing a pareto frontier between throughput and e2e latency, or throughput and interactivity (tok/sec per user)

moving to the pareto frontier means serve more users or delivering faster responses with the same infrastructure

Fabricated Knowledge 님이 재게시함

Jordan Nanos

@JordanNanos

. 10. 9.

the industry needs an open-source, automated inference benchmark that moves at the same speed as the AI software ecosystem: inferencemax.ai

JordanNanos's tweet image. the industry needs an open-source, automated inference benchmark that moves at the same speed as the AI software ecosystem: inferencemax.ai

Fabricated Knowledge

@_fabknowledge_

. 10. 10.

There are paper specs and real specs. Today’s the first day we see real world performance at scale! Excited to see this evolve overtime!

SemiAnalysis

@SemiAnalysis_

. 10. 9.

InferenceMAX™: Open Source Inference Benchmarking Support from OpenAI, @LisaSu , @AnushElangovan , @ia_buck ,@tri_dao, and many more. NVIDIA GB200 NVL72, AMD MI355X Throughput Token per GPU, Latency Tok/s/user Perf per Dollar, Cost per Million Tokens, Tokens per Provisioned…

SemiAnalysis_'s tweet image. InferenceMAX™: Open Source Inference Benchmarking
Support from OpenAI, @LisaSu , @AnushElangovan , @ia_buck ,@tri_dao, and many more.
NVIDIA GB200 NVL72, AMD MI355X
Throughput Token per GPU, Latency Tok/s/user
Perf per Dollar, Cost per Million Tokens, Tokens per Provisioned…

Fabricated Knowledge

@_fabknowledge_

. 10. 10.

One of the random viz I wish I could see is when I’m on the subway, I wish I could see the explosion of RF as a train goes to a new station. I can’t imagine what it looks like, it probably is pure chaos, and I legit wish I could see RF to witness

Fabricated Knowledge 님이 재게시함

NVIDIA

@nvidia

. 10. 9.

📣 NVIDIA Blackwell sets the standard for AI inference on SemiAnalysis InferenceMAX. Our most recent results on the independent benchmarks show NVIDIA’s Blackwell Platform leads AI factory ROI—— see how NVIDIA Blackwell GB200 NVL72 can yield $75 million in token revenue over…

nvidia's tweet image. 📣 NVIDIA Blackwell sets the standard for AI inference on SemiAnalysis InferenceMAX.

Our most recent results on the independent benchmarks show NVIDIA’s Blackwell Platform leads AI factory ROI—— see how NVIDIA Blackwell GB200 NVL72 can yield $75 million in token revenue over…

Fabricated Knowledge

@_fabknowledge_

. 10. 10.

I really do not think people appreciate what this is: there has never been a source of truth for GPU throughout. Specs on paper have never meant anything. This is IT !!!

Dylan Patel

@dylan522p

. 10. 9.

InferenceMAX™: Open Source Inference Benchmarking

출처: newsletter.semianalysis.com

Fabricated Knowledge 님이 재게시함

tender

@tenderizzation

. 10. 9.

Fabricated Knowledge 님이 재게시함

Dylan Patel

@dylan522p

. 10. 9.

Today we are launching InferenceMAX! We have support from Nvidia, AMD, OpenAI, Microsoft, Pytorch, SGLang, vLLM, Oracle, CoreWeave, TogetherAI, Nebius, Crusoe, HPE, SuperMicro, Dell It runs every day on the latest software (vLLM, SGLang, etc) across hundreds of GPUs, $10Ms of…

Dylan Patel

@dylan522p

. 10. 8.

Going to be dropping something huge in 24 hours I think it'll reshape how everyone thinks about chips, inference, and infrastructure It's directly supported by NVIDIA, AMD, Microsoft, OpenAI, Together AI, CoreWeave, Nebius, PyTorch Foundation, Supermicro, Crusoe, HPE, Tensorwave,…

Fabricated Knowledge

@_fabknowledge_

. 10. 9.

SemiAnalysis is back on Substack! open.substack.com/pub/semianalys… And we are coming with the biggest piece we’ve done in the AI space: welcome to InferenceMax. If you want to know what AMD does versus NVDA? Here’s the answer

_fabknowledge_'s tweet card. NVIDIA GB200 NVL72, AMD MI355X, Throughput Token per GPU, Latency Tok/s/user, Perf per Dollar, Tokens per Provisioned Megawatt, DeepSeek R1 670B, GPTOSS 120B, Llama3 70B

InferenceMAX™: Open Source Inference Benchmarking

출처: newsletter.semianalysis.com

Fabricated Knowledge

@_fabknowledge_

. 10. 9.

INFERENCE MAXXXXX!!!

Fabricated Knowledge 님이 재게시함

METR

@METR_Evals

. 10. 9.

We estimate that Claude Sonnet 4.5 has a 50%-time-horizon of around 1 hr 53 min (95% confidence interval of 50 to 235 minutes) on our agentic multi-step software engineering tasks. This estimate is lower than the current highest time-horizon point estimate of around 2 hr 15 min.

METR_Evals's tweet image. We estimate that Claude Sonnet 4.5 has a 50%-time-horizon of around 1 hr 53 min (95% confidence interval of 50 to 235 minutes) on our agentic multi-step software engineering tasks. This estimate is lower than the current highest time-horizon point estimate of around 2 hr 15 min.

Fabricated Knowledge 님이 재게시함

SemiAnalysis

@SemiAnalysis_

. 10. 9.

China’s State Council on October 9 approved Order No. 61 of 2025, announcing export controls on certain overseas rare-earth items. This marks the fourth round of rare-earth export restriction efforts; the previous round was on April 8. (1/8)🧵

SemiAnalysis_'s tweet image. China’s State Council on October 9 approved Order No. 61 of 2025, announcing export controls on certain overseas rare-earth items. This marks the fourth round of rare-earth export restriction efforts; the previous round was on April 8.
(1/8)🧵

Fabricated Knowledge 님이 재게시함

Andrej Karpathy

@karpathy

. 10. 9.

I don't know what labs are doing to these poor LLMs during RL but they are mortally terrified of exceptions, in any infinitesimally likely case. Exceptions are a normal part of life and healthy dev process. Sign my LLM welfare petition for improved rewards in cases of exceptions.

Fabricated Knowledge

@_fabknowledge_

. 10. 9.

I'm hoping this goes off today smooth without a hitch just like our transition back to substack did That being said SOOOOOON TM

Dylan Patel

@dylan522p

. 10. 8.

Fabricated Knowledge

@_fabknowledge_

. 10. 8.

There’s another way to think about it. If 1 product is a 2nm and the other is a 3nm yet one has better performance, they call the difference “margin”

Daniel Romero

@HyperTechInvest

. 10. 7.

🚨Lisa Su dropped a bombshell Yet nobody has caught it $AMD's MI450 will use 2nm technology, while $NVDA's Rubin will use 3nm A massive power and efficiency advantage This is breaking news, and I don’t understand why nobody is reporting it