
I'm building my own PyTorch from scratch, Implementing multiple core features like tensors, autograd, NN layers, optimizers, and more. The core engine will be written in C++/CUDA for performance, with a Pythonic, PyTorch-like API. starting C++ from today.

helpful diagram for memory hierarchy explanation:

A Detailed Explanation of CUDA Thread Hierarchy (Threads, Blocks, and Grids): A CUDA thread is the smallest unit of execution on the GPU, similar to a CPU thread, but designed for massive parallelism, each thread has its own registers and runs the same kernel code independently,…


update on my framework: > started implementation of Autograd Engine: the autograd engine has few key tasks: track tensor dependencies, compute gradients automatically, and backpropagate through operations. > added computation graph in the framework: forward pass works for…

> added sum() and mean() in math ops > added more view operations like: .zeros() .ones() .randn() .arange() .permute() .contiguous() .expand() .unsqueeze() .squeeze() so far my framework has: > Tensor ops (core data structure, indexing, slicing, etc) > CPU ops (math, matmul,…


finally i can sleep, updates on my framework tmr.

> added sum() and mean() in math ops > added more view operations like: .zeros() .ones() .randn() .arange() .permute() .contiguous() .expand() .unsqueeze() .squeeze() so far my framework has: > Tensor ops (core data structure, indexing, slicing, etc) > CPU ops (math, matmul,…


i didnt backup my frameworks obsidian vault before switching to fedora, fml 😭😭
A Detailed Explanation of CUDA Thread Hierarchy (Threads, Blocks, and Grids): A CUDA thread is the smallest unit of execution on the GPU, similar to a CPU thread, but designed for massive parallelism, each thread has its own registers and runs the same kernel code independently,…


matrix addition in CUDA, the first picture is launching 1 block of NxN threads. ~ each thread computes one element of matrix > dim3 threadsPerBlock(N,N); > MatAdd<<<1, threadsPerBlock>>>(d_A, d_B, d_C, N); this works fine with small matrices as i did N=4, which is only 16…




wrote my first CUDA, a simple c = a + b: what its doing: > allocate CPU mem and fill with input data > allocate GPU mem > copy input data CPU to GPU > gpu kernel computes > copy result back gpu to cpu > print results from CPU > free GPU mem 😭


Building LearnFlow -learnt about background jobs and queues -used Celery as task queue system and Redis as the message broker -offloaded goal inactivity reminder email sending task to background through this more details below




fedora is amazing. no ricing yet, just setting up stuff atm
Had my fun with Mint, its amazing, caused 0 problems since install but got bored of it, Installing Fedora


guy is building crazy stuff
(1/n) Building Learnflow -implemented throttling for Free/Premium Users 10/day for free and 1000 for premium(for the sake of testing) -cached monthly and weekly progress summaries -benchmark tests for cache and non-cached responses, insane diff more info below




(1/n) Building Learnflow -implemented throttling for Free/Premium Users 10/day for free and 1000 for premium(for the sake of testing) -cached monthly and weekly progress summaries -benchmark tests for cache and non-cached responses, insane diff more info below




cool stuff, give it a read
I kinda went deeper into tokenization in NLP, partly because I found it super interesting and partly because a friend asked me to explain it :p Turns out it’s way more than just “splitting text on spaces.” It’s the foundation of how LLMs actually see language, and the choice of…

I kinda went deeper into tokenization in NLP, partly because I found it super interesting and partly because a friend asked me to explain it :p Turns out it’s way more than just “splitting text on spaces.” It’s the foundation of how LLMs actually see language, and the choice of…

United States Trends
- 1. Jets 77.9K posts
- 2. Jets 77.9K posts
- 3. Justin Fields 7,917 posts
- 4. Aaron Glenn 4,210 posts
- 5. #HardRockBet 3,285 posts
- 6. Sean Payton 1,859 posts
- 7. London 200K posts
- 8. Garrett Wilson 3,152 posts
- 9. Bo Nix 2,951 posts
- 10. HAPPY BIRTHDAY JIMIN 138K posts
- 11. Tyrod 1,515 posts
- 12. #OurMuseJimin 185K posts
- 13. #DENvsNYJ 2,044 posts
- 14. #JetUp 1,890 posts
- 15. Peart 1,884 posts
- 16. #30YearsofLove 162K posts
- 17. Bam Knight N/A
- 18. Kurt Warner N/A
- 19. Hail Mary 2,379 posts
- 20. Rich Eisen N/A
Something went wrong.
Something went wrong.