
Nicholas Wilt
@CUDAHandbook
Nicholas Wilt was on the inception team for CUDA, wrote The CUDA Handbook, and writes at https://parallelprogrammer.substack.com
قد يعجبك
Well when you lay it all out like that, yeah, it does seem implausible. But all true
The deeper you go into the semiconductor supply chain, the less believable it becomes. > TSMC, a company on a small island, produces over 90% of the world’s most advanced chips > TSMC relies on dutch company ASML for EUV lithography machines > ASML depends on German Company…
In a previous life, I optimized some GPU-assisted code that was being operated by an interpreter. Full CUDA port was 60x faster. I think the GPU often was done before control had been returned to the caller.
I wonder how many H100 hours have been burned waiting a handful of milliseconds for python to "glue" the next task to the previous one
Well, hold on a sec. They also won because even without TensorCores, they were compellingly faster than CPUs. Geoffrey Hinton reportedly was touting GPUs for AI in 2009!
gpus didn’t win because they had magic math units. they won because cuda exposed a control plane that let people experiment packing work close to memory, overlapping loads, hiding latency. specialised ai chips bake these optimizations directly into hardware but that’s a wager on…
Whomever said this has never ported a workload to CUDA for a 100x speedup and I feel sorry for them
There's a saying, "if you get a 5x speedup you did something smart, if you get a 100x speedup you stopped doing something dumb", but that's just not true in SAT solving! I just got a 3000x speed-up in this search (from 15 days to 6 minutes), from making better SAT instances.
With benefit of hindsight, it may have been a mistake to reorg Windows under Azure. I have been a PC/Microsoft partisan for 40 years, worked on Windows while at Microsoft, and may eschew Windows on my next laptop.
Microsoft is plugging more holes that let you use Windows 11 without an online account. Microsoft really doesn’t want you creating a local account on Windows 11. Full details on the changes 👇 theverge.com/news/793579/mi…
Banning PowerPoint has other salubrious effects. Amazon famously banned PowerPoint decades ago, because it enables companies to lie to themselves. I have little to no respect for Sun on other matters: business, IP, technological. inc.com/justin-bariso/…
In a 1997 interview with BYTE, Scott McNealy, then CEO of Sun Microsystems, arguably the most renowned Unix company of the time, spoke disparagingly about Unix, remarking, “The problem with Unix is that nobody protected the brand to mean something and the brand lost value”. At…

The weather app on my phone is p good
What's a piece of software you use regularly that you can't think of a single complaint about?
I was using the word “insane,” too, but maybe in a different sense. The clock cycles were Intel’s to lose. I wrote an article about this: “Why CUDA Succeeded” open.substack.com/pub/parallelpr…
tbh, Intel's software stack is actually insane. they have tools for everything. they could have been The Nvidia if only...

I wrote a sort historical take on the evolution of SIMD on x86: open.substack.com/pub/parallelpr…
For those wondering what is AVX? Your CPU normally, on each core, applies one operation on one piece of data. This gets quite slow for bigger data :( So why not use the GPU? That adds latency and overhead. What if we want parallel operations, but it isn’t worth offloading?…

I pushed CMake files into the CUDA Handbook repository, to make it easier to build the samples. Among other things, the exercise underscored the need to revisit the code that uses texturing (CUDA12 no longer supports texture references). Feedback welcome! github.com/ArchaeaSoftwar…
Next stop in this journey: cache lines 👀
The next crushing fact: CPUs don't think in bytes, they think in words (they operate on registers, usually word-sized chunks).
This article describes how to do the insertion step for Insertion Sort into an AVX2 register. A combination of shifts, XOR, and blends after the compare. (link in reply)
United States الاتجاهات
- 1. Branch 37.1K posts
- 2. Chiefs 112K posts
- 3. Red Cross 53.8K posts
- 4. Exceeded 5,846 posts
- 5. Binance DEX 5,137 posts
- 6. #njkopw 8,011 posts
- 7. Mahomes 34.8K posts
- 8. Rod Wave 1,659 posts
- 9. Air Force One 57.7K posts
- 10. #LaGranjaVIP 83.3K posts
- 11. Eitan Mor 17.3K posts
- 12. #TNABoundForGlory 59.7K posts
- 13. #LoveCabin 1,384 posts
- 14. Ziv Berman 20.2K posts
- 15. Alon Ohel 17.9K posts
- 16. Tel Aviv 59.4K posts
- 17. Matan Angrest 16.1K posts
- 18. Omri Miran 16.4K posts
- 19. Bryce Miller 4,608 posts
- 20. Tom Homan 81.5K posts
Something went wrong.
Something went wrong.