zkash

@asyncakash

learning to make gpus go brrr | 🦀 | prev: @availproject, @puffer_finance, @class_lambda, topology | alum @iitroorkee

Iscritto a Maggio 2022

1KPost 412Follower 297Following

Potrebbero piacerti

@jajakobyly

@0xpiapark

@fntupas

@sunbh_eth

@ammarif_

@0xtiagofneto

@MatHitchens

@Islam98568047

@pavvannn

@0xKodawari

@ksk8176012

@bpol_tweet

@ziblibito

@FreeBird914845

@julio4__

zkash

@asyncakash

3 h

and still claude gives you uv pip

tuna🍣

@tunahorse21

22 ott

NO CLAUDE I SAID USE UV

zkash

@asyncakash

20 ott

the chatgpt deep research task I requested last night is still running lol Seems like the scheduler kicked off my task before completion welp 🥲 @OpenAI compensate me with one free dr query now 😤

Repost di zkash

Nikunj Kothari

@nikunj

10 ott

Some unstructured thoughts on what creates abundance mindset..

zkash

@asyncakash

10 ott

what’s with the meteoric pump of $zec while the entire market is bleeding red?!

Repost di zkash

New post in the GPU 𝕻𝖊𝖗𝖋𝖔𝖗𝖒𝖆𝖓𝖈𝖊 Glossary on memory coalescing -- a hardware feature that CUDA programmers need to mind to get anywhere near full memory bandwidth utilization. The article includes a quick µ-benchmark, reproducible with Godbolt. What a tool!

charles_irl's tweet image. New post in the GPU 𝕻𝖊𝖗𝖋𝖔𝖗𝖒𝖆𝖓𝖈𝖊 Glossary on memory coalescing -- a hardware feature that CUDA programmers need to mind to get anywhere near full memory bandwidth utilization.

The article includes a quick µ-benchmark, reproducible with Godbolt. What a tool!

Repost di zkash

Alex L Zhang

@a1zhang

30 set

it's insane to me how little attention the llm.q repo has it's a fully C/C++/CUDA implementation of multi-gpu (zero + fsdp), quantized LLM training with support for selective AC it's genuinely the coolest OSS thing I've seen this year (what's crazier is 1 person wrote it!)

a1zhang's tweet image. it's insane to me how little attention the llm.q repo has

it's a fully C/C++/CUDA implementation of multi-gpu (zero + fsdp), quantized LLM training with support for selective AC

it's genuinely the coolest OSS thing I've seen this year (what's crazier is 1 person wrote it!)

Repost di zkash

Charles 🎉 Frye

@charles_irl

26 set

We reverse-engineered Flash Attention 4.

zkash

@asyncakash

25 set

Really enjoyed @samsja19’s talk on the challenges of decentralized training (e.g. DiLoCo) under low-bandwidth conditions. Was surprised to learn how much weather can destabilize training 🤯 @PrimeIntellect is doing some wild stuff with decentralized RL! 🚀 Thanks for the…

Repost di zkash

Ben Burtenshaw

@ben_burtenshaw

24 set

too much new learning material! we're releasing a few chapters of hard study on post training AI models. it covers all major aspects plus more to come. - Evaluating Large Language models on benchmarks and custom use cases - Preference Alignment with DPO - Fine tuning Vision…

ben_burtenshaw's tweet image. too much new learning material! we're releasing a few chapters of hard study on post training AI models. it covers all major aspects plus more to come.

- Evaluating Large Language models on benchmarks and custom use cases
- Preference Alignment with DPO
- Fine tuning Vision…

Repost di zkash

Alex L Zhang

@a1zhang

13 set

hi! if you’re interested in using or writing mega kernels for AI (one big GPU kernel for an entire model) you should tune in to today’s @GPU_MODE livestream today in ~3 hours we have the authors of MPK talking about their awesome new compiler for mega kernels! see you there :)

a1zhang's tweet image. hi! if you’re interested in using or writing mega kernels for AI (one big GPU kernel for an entire model) you should tune in to today’s @GPU_MODE livestream

today in ~3 hours we have the authors of MPK talking about their awesome new compiler for mega kernels!

see you there :)

Repost di zkash

JingyuanLiu

@JingyuanLiu123

13 set

I was lucky to work in both China and the US LLM labs, and I've been thinking this for a while. The current values of pretraining are indeed different: US labs be like: - lots of GPUs and much larger flops run - Treating stabilities more seriously, and could not tolerate spikes…

Charuru Charuru

@CharuruCha14310

13 set

I bet OpenAI/xAI is laughing so hard, this result is obvious tbh, they took a permanent architectural debuff in order to save on compute costs.

zkash

@asyncakash

12 set

Qwen is basically the Samsung (smartphone) of llms. They ship nice new models everything month.

Ahmad

@TheAhmadOsman

12 set

China saved opensource LLMs, some notable releases from July only > Kimi K2 > Qwen3 235B-A22B-2507 > Qwen3 Coder 480B-A35B > Qwen3 235B-A22B-Thinking-2507 > GLM-4.5 > GLM-4.5 Air > Qwen3 30B-A3B-2507 > Qwen3 30B-A3B-Thinking-2507 > Qwen3 Coder 30B-A3B US & EU need to do better

zkash

@asyncakash

12 set

imagine trying to “learn to code” in cursor when the tab key is basically god mode 💀

Cursor

@cursor_ai

11 set

We've trained a new Tab model that is now the default in Cursor. This model makes 21% fewer suggestions than the previous model while having a 28% higher accept rate for the suggestions it makes. Learn more about how we improved Tab with online RL.

zkash

@asyncakash

11 set

ai bros really out here teaching each other how to draw assholes 😭

zkash

@asyncakash

11 set

chinese ai labs slaying it 🔥

Qwen

@Alibaba_Qwen

11 set

🚀 Introducing Qwen3-Next-80B-A3B — the FUTURE of efficient LLMs is here! 🔹 80B params, but only 3B activated per token → 10x cheaper training, 10x faster inference than Qwen3-32B.(esp. @ 32K+ context!) 🔹Hybrid Architecture: Gated DeltaNet + Gated Attention → best of speed &…

Alibaba_Qwen's tweet image. 🚀 Introducing Qwen3-Next-80B-A3B — the FUTURE of efficient LLMs is here!

🔹 80B params, but only 3B activated per token → 10x cheaper training, 10x faster inference than Qwen3-32B.(esp. @ 32K+ context!)
🔹Hybrid Architecture: Gated DeltaNet + Gated Attention → best of speed &amp;…

zkash

@asyncakash

11 set

Just had the most amazing Transformers (with flash attention) lecture from @danielhanchen — he broke down the guts of Transformers and walked us through the full backprop step-by-step, all by hand. Huge thanks to @TheZachMueller for organizing!

Repost di zkash

Elliot Arledge (h/eng)

@elliotarledge

10 set

DO NOT buy a gpu to write kernels. use @modal notebooks. take 2 mins out of your day to learn this simple trick and kick off your work without paying a shit ton for electricity or cloud gpu run 24/7

Repost di zkash

Aditya Sharma

@Adityaaa_Sharma

8 set

🚨 career update i’ve joined @bulletxyz_ to build the growth engine driving the next million on-chain traders. excited to build a @solana native trading layer that brings CEX performance fully on-chain. more ↓