zkash

@asyncakash

learning to make gpus go brrr | 🦀 | prev: @availproject, @puffer_finance, @class_lambda, topology | alum @iitroorkee

network school

Iscritto a Maggio 2022

1KPost 418Follower 302Following

Potrebbero piacerti

@jajakobyly

@0xpiapark

@fntupas

@levaunhall

@sunbh_eth

@ammarif_

@0xtiagofneto

@Islam98568047

@pavvannn

@0xKodawari

@ksk8176012

@bpol_tweet

@ziblibito

@FreeBird914845

@julio4__

zkash

@asyncakash

7 nov

first forgot to claim TIA, this time forgot to claim MON 😵 am i cooked chat 😭

zkash

@asyncakash

4 nov

Binance providing euthanasia

Nay @Network School

@naygozalova

1 nov

Billionaire glowups are way too predictable. I’m sure Vitalik is gonna surprise us in 10 years if not less 🔥

zkash

@asyncakash

2 nov

why is elon looking like his viral chinese parody guy

Joe Rogan

@joerogan

31 ott

Let’s fucking goooooo! 3+ hours with the great rocket man @elonmusk open.spotify.com/episode/6vBr2k…

Repost di zkash

Gradient

@Gradient_HQ

28 ott

Open Sourcing Parallax: Your Sovereign AI OS. The easiest way to host AI applications that are entirely yours.

zkash

@asyncakash

23 ott

and still claude gives you uv pip

tuna🍣

@tunahorse21

22 ott

NO CLAUDE I SAID USE UV

zkash

@asyncakash

20 ott

the chatgpt deep research task I requested last night is still running lol Seems like the scheduler kicked off my task before completion welp 🥲 @OpenAI compensate me with one free dr query now 😤

Repost di zkash

Nikunj Kothari

@nikunj

10 ott

Some unstructured thoughts on what creates abundance mindset..

Repost di zkash

New post in the GPU 𝕻𝖊𝖗𝖋𝖔𝖗𝖒𝖆𝖓𝖈𝖊 Glossary on memory coalescing -- a hardware feature that CUDA programmers need to mind to get anywhere near full memory bandwidth utilization. The article includes a quick µ-benchmark, reproducible with Godbolt. What a tool!

charles_irl's tweet image. New post in the GPU 𝕻𝖊𝖗𝖋𝖔𝖗𝖒𝖆𝖓𝖈𝖊 Glossary on memory coalescing -- a hardware feature that CUDA programmers need to mind to get anywhere near full memory bandwidth utilization.

The article includes a quick µ-benchmark, reproducible with Godbolt. What a tool!

Repost di zkash

Alex L Zhang

@a1zhang

30 set

it's insane to me how little attention the llm.q repo has it's a fully C/C++/CUDA implementation of multi-gpu (zero + fsdp), quantized LLM training with support for selective AC it's genuinely the coolest OSS thing I've seen this year (what's crazier is 1 person wrote it!)

a1zhang's tweet image. it's insane to me how little attention the llm.q repo has

it's a fully C/C++/CUDA implementation of multi-gpu (zero + fsdp), quantized LLM training with support for selective AC

it's genuinely the coolest OSS thing I've seen this year (what's crazier is 1 person wrote it!)

Repost di zkash

Charles 🎉 Frye

@charles_irl

26 set

We reverse-engineered Flash Attention 4.

zkash

@asyncakash

25 set

Really enjoyed @samsja19’s talk on the challenges of decentralized training (e.g. DiLoCo) under low-bandwidth conditions. Was surprised to learn how much weather can destabilize training 🤯 @PrimeIntellect is doing some wild stuff with decentralized RL! 🚀 Thanks for the…

Repost di zkash

Ben Burtenshaw

@ben_burtenshaw

24 set

too much new learning material! we're releasing a few chapters of hard study on post training AI models. it covers all major aspects plus more to come. - Evaluating Large Language models on benchmarks and custom use cases - Preference Alignment with DPO - Fine tuning Vision…

ben_burtenshaw's tweet image. too much new learning material! we're releasing a few chapters of hard study on post training AI models. it covers all major aspects plus more to come.

- Evaluating Large Language models on benchmarks and custom use cases
- Preference Alignment with DPO
- Fine tuning Vision…

Repost di zkash

Alex L Zhang

@a1zhang

13 set

hi! if you’re interested in using or writing mega kernels for AI (one big GPU kernel for an entire model) you should tune in to today’s @GPU_MODE livestream today in ~3 hours we have the authors of MPK talking about their awesome new compiler for mega kernels! see you there :)

a1zhang's tweet image. hi! if you’re interested in using or writing mega kernels for AI (one big GPU kernel for an entire model) you should tune in to today’s @GPU_MODE livestream

today in ~3 hours we have the authors of MPK talking about their awesome new compiler for mega kernels!

see you there :)

Repost di zkash

JingyuanLiu

@JingyuanLiu123

13 set

I was lucky to work in both China and the US LLM labs, and I've been thinking this for a while. The current values of pretraining are indeed different: US labs be like: - lots of GPUs and much larger flops run - Treating stabilities more seriously, and could not tolerate spikes…

Charuru Charuru

@CharuruCha14310

13 set

I bet OpenAI/xAI is laughing so hard, this result is obvious tbh, they took a permanent architectural debuff in order to save on compute costs.

zkash

@asyncakash

12 set

Qwen is basically the Samsung (smartphone) of llms. They ship nice new models everything month.

Ahmad

@TheAhmadOsman

12 set

China saved opensource LLMs, some notable releases from July only > Kimi K2 > Qwen3 235B-A22B-2507 > Qwen3 Coder 480B-A35B > Qwen3 235B-A22B-Thinking-2507 > GLM-4.5 > GLM-4.5 Air > Qwen3 30B-A3B-2507 > Qwen3 30B-A3B-Thinking-2507 > Qwen3 Coder 30B-A3B US & EU need to do better

zkash

@asyncakash

12 set

imagine trying to “learn to code” in cursor when the tab key is basically god mode 💀

Cursor

@cursor_ai

11 set

We've trained a new Tab model that is now the default in Cursor. This model makes 21% fewer suggestions than the previous model while having a 28% higher accept rate for the suggestions it makes. Learn more about how we improved Tab with online RL.