tensorbert's profile picture. I’m a software engineer building high-performance kernels and compilers at Anthropic!  Previously at Facebook/Meta (PyTorch, HHVM, ReDex)

Bert Maher

@tensorbert

I’m a software engineer building high-performance kernels and compilers at Anthropic! Previously at Facebook/Meta (PyTorch, HHVM, ReDex)

I am really enjoying using Claude Code with Sonnet 4.5! It's super smart, and super fast!

Introducing Claude Sonnet 4.5—the best coding model in the world. It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math.

claudeai's tweet image. Introducing Claude Sonnet 4.5—the best coding model in the world.

It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math.


Bert Maher 已轉發

Introducing Claude Sonnet 4.5—the best coding model in the world. It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math.

claudeai's tweet image. Introducing Claude Sonnet 4.5—the best coding model in the world.

It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math.

😍

Tri Dao says Claude Code makes him 1.5x more productive and that it's quite helpful at writing Triton kernels

scaling01's tweet image. Tri Dao says Claude Code makes him 1.5x more productive and that it's quite helpful at writing Triton kernels


Good list! Simon’s and Pranjal’s matmul blogs were foundational to my understanding of GPU performance

Some perf related must-reads: • How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: siboehm.com/articles/22/CU… • Outperforming cuBLAS on H100: a Worklog: cudaforfun.substack.com/p/outperformin… • Defeating Nondeterminism in LLM Inference: thinkingmachines.ai/blog/defeating… • Making Deep…



lol, there is quite the explosion of kernel DSLs lately (triton, tilelang, gluon, TLX, cuteDSL, cuTile, …) And honestly as much as I love TLX and want it to succeed, I think the next big kernel programming language might be… natural, human language

Just one more DSL bro. I promise bro just one more DSL and we'll fix hardware adoption. It's just a better DSL bro. Please just one more. One more DSL and we'll port all the kernels. I just need one more DSL



A great example of how subtle numerical issues can bite you. Fusing multiply-add into FMA strictly improves precision of that operation (the mul is computed in infinite precision). But it breaks the larger expression if other parts are not computed with the same high precision!

where have I seen something like this before "This caused a mismatch: operations that should have agreed on the highest probability token were running at different precision levels. The precision mismatch meant they didn't agree on which token had the highest probability." see…

tenderizzation's tweet image. where have I seen something like this before
"This caused a mismatch: operations that should have agreed on the  highest probability token were running at different precision levels.  The precision mismatch meant they didn't agree on which token had the  highest probability." see…


Debugging subtle numerical issues in ML systems - especially in the compiler! - is really freaking hard. Very impressed by the debugging that went into discovering these

In our investigation, we uncovered three separate bugs. They were partly overlapping, making diagnosis even trickier. We've now resolved all three bugs and written a technical report on what happened, which you can find here: anthropic.com/engineering/a-…



I found myself wondering if we might benefit from a resurgence of Halide- (or TVM) -like ideas, where you have some math to optimize, and a library of optimizations through which the machine can search to find the best outcome


What if instead of a paperclip maximizer we built and hoodie-and-bag maximizer, and what if it turns out we’re already living in its world?? 😱

We put some hooks for hoodies and bags above the bench to put our shoes on by the front door. Of course its still only a matter of time before hoodies and bags conquer everything



I’ve seen this “FAANG vibe code” post a few times and tbh this workflow sounds… tedious. “Write a design doc, align stakeholders, iterate on design review, etc” How about: have an awesome idea and build it

tensorbert's tweet image. I’ve seen this “FAANG vibe code” post a few times and tbh this workflow sounds… tedious. “Write a design doc, align stakeholders, iterate on design review, etc”

How about: have an awesome idea and build it

Bert Maher 已轉發

TIL, RIP Triton, killed by inability to have good Blackwell performance


Bert Maher 已轉發
marksaroufim's tweet image.

TIL, RIP Triton, killed by inability to have good Blackwell performance



This is cool work on kernel generation. A challenge I’d like to see solved: given a brand new architecture doc, generate an updated kernel. (Possibly leveraging the kernels for the old platform, if it helps)

Excited to share what friends and I have been working on at @Standard_Kernel We've raised from General Catalyst (@generalcatalyst), Felicis (@felicis), and a group of exceptional angels. We have some great H100 BF16 kernels in pure CUDA+PTX, featuring: - Matmul 102%-105% perf…

anneouyang's tweet image. Excited to share what friends and I have been working on at @Standard_Kernel 

We've raised from General Catalyst (@generalcatalyst), Felicis (@felicis), and a group of exceptional angels. 

We have some great H100 BF16 kernels in pure CUDA+PTX, featuring:
- Matmul 102%-105% perf…


Also there’s a pytorch/ directory. It does not contain pytorch

it will never stop being funny to me that PyTorch's directory in our internal monorepo is caffe2/



Bert Maher 已轉發

The most important skill for a researcher is not technical ability. It's taste. The ability to identify interesting and tractable problems, and recognize important ideas when they show up. This can't be taught directly. It's cultivated through curiosity and broad reading.


I like python but weird stuff happens with imports. We had a bug with Triton-related imports in FB’s stack that no one could explain or repro outside of prod. We “fixed” it by generally cleaning up some imports but I would have much preferred a definitive explanation

This is why I never professionally used, and never ever professionally will use Python; it is simply too weird. Many people don't realize that imports in Python are regular "simple" statements and can cause arbitrary code execution, since the module it imports is a list of…



Loading...

Something went wrong.


Something went wrong.