你可能會喜歡
I am really enjoying using Claude Code with Sonnet 4.5! It's super smart, and super fast!
Introducing Claude Sonnet 4.5—the best coding model in the world. It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math.

😍
Tri Dao says Claude Code makes him 1.5x more productive and that it's quite helpful at writing Triton kernels

Good list! Simon’s and Pranjal’s matmul blogs were foundational to my understanding of GPU performance
Some perf related must-reads: • How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: siboehm.com/articles/22/CU… • Outperforming cuBLAS on H100: a Worklog: cudaforfun.substack.com/p/outperformin… • Defeating Nondeterminism in LLM Inference: thinkingmachines.ai/blog/defeating… • Making Deep…
lol, there is quite the explosion of kernel DSLs lately (triton, tilelang, gluon, TLX, cuteDSL, cuTile, …) And honestly as much as I love TLX and want it to succeed, I think the next big kernel programming language might be… natural, human language
Just one more DSL bro. I promise bro just one more DSL and we'll fix hardware adoption. It's just a better DSL bro. Please just one more. One more DSL and we'll port all the kernels. I just need one more DSL
A great example of how subtle numerical issues can bite you. Fusing multiply-add into FMA strictly improves precision of that operation (the mul is computed in infinite precision). But it breaks the larger expression if other parts are not computed with the same high precision!
where have I seen something like this before "This caused a mismatch: operations that should have agreed on the highest probability token were running at different precision levels. The precision mismatch meant they didn't agree on which token had the highest probability." see…

Debugging subtle numerical issues in ML systems - especially in the compiler! - is really freaking hard. Very impressed by the debugging that went into discovering these
In our investigation, we uncovered three separate bugs. They were partly overlapping, making diagnosis even trickier. We've now resolved all three bugs and written a technical report on what happened, which you can find here: anthropic.com/engineering/a-…
I found myself wondering if we might benefit from a resurgence of Halide- (or TVM) -like ideas, where you have some math to optimize, and a library of optimizations through which the machine can search to find the best outcome
What if instead of a paperclip maximizer we built and hoodie-and-bag maximizer, and what if it turns out we’re already living in its world?? 😱
We put some hooks for hoodies and bags above the bench to put our shoes on by the front door. Of course its still only a matter of time before hoodies and bags conquer everything
I’ve seen this “FAANG vibe code” post a few times and tbh this workflow sounds… tedious. “Write a design doc, align stakeholders, iterate on design review, etc” How about: have an awesome idea and build it

TIL, RIP Triton, killed by inability to have good Blackwell performance

TIL, RIP Triton, killed by inability to have good Blackwell performance
This is cool work on kernel generation. A challenge I’d like to see solved: given a brand new architecture doc, generate an updated kernel. (Possibly leveraging the kernels for the old platform, if it helps)
Excited to share what friends and I have been working on at @Standard_Kernel We've raised from General Catalyst (@generalcatalyst), Felicis (@felicis), and a group of exceptional angels. We have some great H100 BF16 kernels in pure CUDA+PTX, featuring: - Matmul 102%-105% perf…

Also there’s a pytorch/ directory. It does not contain pytorch
it will never stop being funny to me that PyTorch's directory in our internal monorepo is caffe2/
The most important skill for a researcher is not technical ability. It's taste. The ability to identify interesting and tractable problems, and recognize important ideas when they show up. This can't be taught directly. It's cultivated through curiosity and broad reading.
I like python but weird stuff happens with imports. We had a bug with Triton-related imports in FB’s stack that no one could explain or repro outside of prod. We “fixed” it by generally cleaning up some imports but I would have much preferred a definitive explanation
This is why I never professionally used, and never ever professionally will use Python; it is simply too weird. Many people don't realize that imports in Python are regular "simple" statements and can cause arbitrary code execution, since the module it imports is a list of…
United States 趨勢
- 1. D’Angelo 292K posts
- 2. Young Republicans 15K posts
- 3. #PortfolioDay 16.9K posts
- 4. Pentagon 108K posts
- 5. Politico 171K posts
- 6. Brown Sugar 21K posts
- 7. Presidential Medal of Freedom 62.2K posts
- 8. Big 12 N/A
- 9. Drew Struzan 29.3K posts
- 10. Scream 5 N/A
- 11. David Bell N/A
- 12. Black Messiah 11K posts
- 13. Venables 3,739 posts
- 14. Soybeans 5,528 posts
- 15. Milei 272K posts
- 16. Merino 15.7K posts
- 17. Nick Mangold N/A
- 18. World Cup 345K posts
- 19. Voodoo 22K posts
- 20. George Strait 4,094 posts
你可能會喜歡
-
Tri Dao
@tri_dao -
typedfemale
@typedfemale -
Hadi Salman
@hadisalmanX -
Horace He
@cHHillee -
Mark Saroufim
@marksaroufim -
Ivan Zhang
@1vnzh -
Bram Wasti
@bwasti -
Cristian Garcia
@cgarciae88 -
Dmytro Dzhulgakov
@dzhulgakov -
Edward Z. Yang
@ezyang -
Jon Barron
@jon_barron -
Dhruv Batra
@DhruvBatra_ -
Konstantin Mishchenko
@konstmish -
Zachary Nado
@zacharynado -
Nils Reimers
@Nils_Reimers
Something went wrong.
Something went wrong.