#100daysofcuda search results

Jino Rohit

Apr 16

day17 #100daysofcuda Today implemented naive outer vector add and a block wise outer vector add. Also spending some time reading pytorch graph computation. Code - github.com/JINO-ROHIT/adv…

jino_rohit's tweet image. day17 #100daysofcuda

Today implemented naive outer vector add and a block wise outer vector add.

Also spending some time reading pytorch graph computation.

Code - github.com/JINO-ROHIT/adv…

Nemo

@xkxxhk

Mar 21

I did it!!! I did it guys. It was a nice run. All thanks to @hkproj for starting #100DaysofCuda CC: @salykova_

It's your turn now @mobicham @ajhinh. Towards SOTA grayscale kernel 🤣💀 Jokes aside, the GPU Kernel Leaderboard is a great place to sharpen your CUDA skills and learn how to build the fastest kernels. Maybe @__tinygrad__ will make a comeback. Lets see. Link below

salykova_'s tweet image. It's your turn now @mobicham @ajhinh. Towards SOTA grayscale kernel 🤣💀 Jokes aside, the GPU Kernel Leaderboard is a great place to sharpen your CUDA skills and learn how to build the fastest kernels. Maybe @__tinygrad__ will make a comeback. Lets see. Link below

Jino Rohit

@jino_rohit

Apr 23

day23 #100daysofcuda Implemented matrix multiplication over batches. Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 18

day19 #100daysofcuda Implemented sum across a batch of numbers. Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 21

day22 #100daysofcuda Implemented a convolution2d kernel in triton :) Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 11

Day 12 #100daysofcuda Implemented a relu kernel on triton. Code - github.com/JINO-ROHIT/adv…

Shlok Limbhare

@limbizzz11

Feb 8

"10 days into CUDA, and I’ve earned my first badge of honor! 🚀 From simple kernels to profiling, every day is a step closer to mastering GPU computing. Onward to 100! #CUDA #GPUProgramming #100DaysOfCUDA"

limbizzz11's tweet image. "10 days into CUDA, and I’ve earned my first badge of honor! 🚀 From simple kernels to profiling, every day is a step closer to mastering GPU computing. Onward to 100! #CUDA #GPUProgramming #100DaysOfCUDA"

Jino Rohit

@jino_rohit

Apr 9

day10 #100daysofcuda This marks 1/10th of journey in consistently learning cuda and triton kernels. The goal is to write for unsloth and learn to make an optimal inference engine. Wrote a triton kernel for sigmoid. Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 14

day15 #100daysofcuda Implemented leaky relu and elu kernels in triton. Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 19

day20 #100daysofcuda Today, I implemented softmax, actually a numerically stable version of softmax. Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 20

day21 #100daysofcuda Implementing a scalar version of flash attention. Should be quite simple to read and understand. Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 8

day9 #100daysofcuda Wrote a triton kernel for tanh activation. Going to focus on getting more submissions on tensara and further optimize it. Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 17

day18 #100daysofcuda 1. Implemented fused outer multiplication along with backward pass. 2. Going through some liger kernels. Code - github.com/JINO-ROHIT/adv…

jino_rohit's tweet image. day18 #100daysofcuda

1. Implemented fused outer multiplication along with backward pass.
2. Going through some liger kernels.

Code - github.com/JINO-ROHIT/adv…

Ruhaan Dalal

@ruhan107

Aug 31

Today I completed the #100daysofcuda github.com/Ruhaan838/100D…

github.com

GitHub - Ruhaan838/100Day-GPU: I am trying to Learn CUDA in 100 Days. (inspired by @hkproj)

I am trying to Learn CUDA in 100 Days. (inspired by @hkproj) - Ruhaan838/100Day-GPU

Source: github.com

Jino Rohit

@jino_rohit

Apr 15

Day16 #100daysofcuda 1. Kept it simple and implemented a gelu kernel. 2. Spent a lot more time understanding the intuition for multiple programs/blocks/threads. Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 28

day 28 #100daysofcuda Implemented backward pass for mish kernel and compared it with torch backward gradients! Almost fully equipped with using custom triton kernels for pytorch modules Code - github.com/JINO-ROHIT/adv…

jino_rohit's tweet image. day 28 #100daysofcuda

Implemented backward pass for mish kernel and compared it with torch backward gradients!
Almost fully equipped with using custom triton kernels for pytorch modules

Code - github.com/JINO-ROHIT/adv…

Raghu Nandan Erukulla

@raghu_13B

Sep 30

Day 2/100 of my #100DaysOfCUDA challenge: Optimizing Matrix Multiplication! Yesterday's "naive" kernel was slow. Why? Global memory latency. Today, I implemented a tiled MatMul using shared memory. The strategy: fetch data from the slow "warehouse" (VRAM) in chunks (tiles)

raghu_13B's tweet image. Day 2/100 of my #100DaysOfCUDA challenge: Optimizing Matrix Multiplication!
Yesterday's "naive" kernel was slow. Why? Global memory latency.
Today, I implemented a tiled MatMul using shared memory. The strategy: fetch data from the slow "warehouse" (VRAM) in chunks (tiles)

Jino Rohit

@jino_rohit

Apr 24

day24 #100daysofcuda Implementing quantized matrix multiplication in triton. When doing matrix multiplication with quantized neural networks a common strategy is to store the weight matrix in lower precision, with a shift and scale term. Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 10

day11 #100daysofcuda Today I had some help from tensara team on debugging my previous kernels, realized it had a critical flaw. Fixed those two today. For kernels today, I wrote a selu kernel and topped the tensara leaderboard. Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 29

day 29 #100daysofcuda 1. Implementing mish activation forward and backward pass using a 2d launch grid in triton. 2. Looking into pytorch graph compilation . Code - github.com/JINO-ROHIT/adv…

jino_rohit's tweet image. day 29 #100daysofcuda

1. Implementing mish activation forward and backward pass using a 2d launch grid in triton.
2. Looking into pytorch graph compilation .

Code - github.com/JINO-ROHIT/adv…

Ruhaan Dalal

@ruhan107

Aug 31

Today I completed the #100daysofcuda github.com/Ruhaan838/100D…

github.com

GitHub - Ruhaan838/100Day-GPU: I am trying to Learn CUDA in 100 Days. (inspired by @hkproj)

I am trying to Learn CUDA in 100 Days. (inspired by @hkproj) - Ruhaan838/100Day-GPU

Source: github.com

Jino Rohit

@jino_rohit

Apr 29

day 29 #100daysofcuda 1. Implementing mish activation forward and backward pass using a 2d launch grid in triton. 2. Looking into pytorch graph compilation . Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 28

Jino Rohit

@jino_rohit

Apr 27

day27 #100daysofcuda Today I implemented the mish activation function in triton and bechmarked vs pytorch implementation. Was ~ 5-8% consistently better than torch implementation. Code - github.com/JINO-ROHIT/adv… Should i write beginner friendly blogs for triton?

Jino Rohit

@jino_rohit

Apr 25

day25 #100daysofcuda Today I put together the solutions for the triton puzzles from professor @srush_nlp . Amazing puzzles, they start from basic principles and work you up to flash attention and complex matrix multiplications. Solutions here - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 24

Jino Rohit

@jino_rohit

Apr 23

day23 #100daysofcuda Implemented matrix multiplication over batches. Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 21

day22 #100daysofcuda Implemented a convolution2d kernel in triton :) Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 20

day21 #100daysofcuda Implementing a scalar version of flash attention. Should be quite simple to read and understand. Code - github.com/JINO-ROHIT/adv…

Shivam Singh

@er_shivamsingh0

Apr 20

I am going to start #100DaysofCUDA , i know things in it but let's become consistent again so let's do it Got motivation by @jino_rohit Amazing repo !!!!!! SIR Let's good

Jino Rohit

@jino_rohit

Apr 19

day20 #100daysofcuda Today, I implemented softmax, actually a numerically stable version of softmax. Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 18

day19 #100daysofcuda Implemented sum across a batch of numbers. Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 17

day18 #100daysofcuda 1. Implemented fused outer multiplication along with backward pass. 2. Going through some liger kernels. Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 16

day17 #100daysofcuda Today implemented naive outer vector add and a block wise outer vector add. Also spending some time reading pytorch graph computation. Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 15

Day16 #100daysofcuda 1. Kept it simple and implemented a gelu kernel. 2. Spent a lot more time understanding the intuition for multiple programs/blocks/threads. Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 14

day15 #100daysofcuda Implemented leaky relu and elu kernels in triton. Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 13

day14 #100daysofcuda We made it to two weeks of writing a cuda kernel everyday! Wrote a triton kernel for softplus. Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 11

Day 12 #100daysofcuda Implemented a relu kernel on triton. Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 10

No results for "#100daysofcuda"

Jino Rohit

@jino_rohit

Apr 23

day23 #100daysofcuda Implemented matrix multiplication over batches. Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 16

day17 #100daysofcuda Today implemented naive outer vector add and a block wise outer vector add. Also spending some time reading pytorch graph computation. Code - github.com/JINO-ROHIT/adv…

Nemo

@xkxxhk

Mar 21

I did it!!! I did it guys. It was a nice run. All thanks to @hkproj for starting #100DaysofCuda CC: @salykova_

Aman Salykov

@salykova_

Mar 15

Jino Rohit

@jino_rohit

Apr 11

Day 12 #100daysofcuda Implemented a relu kernel on triton. Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 21

day22 #100daysofcuda Implemented a convolution2d kernel in triton :) Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 14

day15 #100daysofcuda Implemented leaky relu and elu kernels in triton. Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 19

day20 #100daysofcuda Today, I implemented softmax, actually a numerically stable version of softmax. Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 18

day19 #100daysofcuda Implemented sum across a batch of numbers. Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 9

Jino Rohit

@jino_rohit

Apr 8

day9 #100daysofcuda Wrote a triton kernel for tanh activation. Going to focus on getting more submissions on tensara and further optimize it. Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 20

day21 #100daysofcuda Implementing a scalar version of flash attention. Should be quite simple to read and understand. Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 17

day18 #100daysofcuda 1. Implemented fused outer multiplication along with backward pass. 2. Going through some liger kernels. Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 7

day8 #100daysofcuda Implemented a memory efficient version of dropout in triton. Code - github.com/JINO-ROHIT/adv…

Shlok Limbhare

@limbizzz11

Feb 8

Jino Rohit

@jino_rohit

Apr 15

Day16 #100daysofcuda 1. Kept it simple and implemented a gelu kernel. 2. Spent a lot more time understanding the intuition for multiple programs/blocks/threads. Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 5

Day6 #100daysofcuda Implemented a naive softmax kernel and spent some time learning the quirks of triton kernels. Planning to optimize this kernel for the next following days. Code - github.com/JINO-ROHIT/adv…

Jino Rohit

@jino_rohit

Apr 6

day7 #100daysofcuda Made to day 7, today certain concepts clicked on how triton passes in data via meta, kernel warmup striding etc. Implemented the fused softmax from the triton guide . Code - github.com/JINO-ROHIT/adv…

jino_rohit's tweet image. day7 #100daysofcuda

Made to day 7, today certain concepts clicked on how triton passes in data via meta, kernel warmup striding etc.

Implemented the fused softmax from the triton guide .

Code - github.com/JINO-ROHIT/adv…

Something went wrong.

United States Trends

1. #socideveloper_com N/A
2. ARMY Protect The 8thDaesang 46.1K posts
3. #lip_bomb_RESCENE N/A
4. #DaesangForJin 44.1K posts
5. #2025MAMAVOTE 71.6K posts
6. #ENHYPEN 124K posts
7. Black Friday 281K posts
8. ilya 19K posts
9. Shane 23.1K posts
10. Ravens 59.7K posts
11. Lamar 47.1K posts
12. Bengals 53.3K posts
13. Mnet 170K posts
14. Third World Countries 32.4K posts
15. Sarah Beckstrom 235K posts
16. Joe Burrow 21.7K posts
17. Connor 15.5K posts
18. BNB Chain 8,112 posts
19. Hudson 12.2K posts
20. Jungwoo 129K posts