zkash

@asyncakash

learning to make gpus go brrr | 🦀 | prev: @availproject, @puffer_finance, @class_lambda, topology | alum @iitroorkee

انضم في مايو 2022

1ألفالمنشورات 412المتابعون 299المتابَعون

قد يعجبك

@jajakobyly

@0xpiapark

@fntupas

@sunbh_eth

@ammarif_

@0xtiagofneto

@Islam98568047

@pavvannn

@0xKodawari

@ksk8176012

@bpol_tweet

@ziblibito

@FreeBird914845

@julio4__

@Amazinglife81

zkash أعاد

Gradient

@Gradient_HQ

19 س

Open Sourcing Parallax: Your Sovereign AI OS. The easiest way to host AI applications that are entirely yours.

zkash

@asyncakash

٢٣ أكتوبرم

and still claude gives you uv pip

tuna🍣

@tunahorse21

٢٢ أكتوبرم

NO CLAUDE I SAID USE UV

zkash

@asyncakash

٢٠ أكتوبرم

the chatgpt deep research task I requested last night is still running lol Seems like the scheduler kicked off my task before completion welp 🥲 @OpenAI compensate me with one free dr query now 😤

zkash أعاد

Nikunj Kothari

@nikunj

١٠ أكتوبرم

Some unstructured thoughts on what creates abundance mindset..

zkash أعاد

New post in the GPU 𝕻𝖊𝖗𝖋𝖔𝖗𝖒𝖆𝖓𝖈𝖊 Glossary on memory coalescing -- a hardware feature that CUDA programmers need to mind to get anywhere near full memory bandwidth utilization. The article includes a quick µ-benchmark, reproducible with Godbolt. What a tool!

charles_irl's tweet image. New post in the GPU 𝕻𝖊𝖗𝖋𝖔𝖗𝖒𝖆𝖓𝖈𝖊 Glossary on memory coalescing -- a hardware feature that CUDA programmers need to mind to get anywhere near full memory bandwidth utilization.

The article includes a quick µ-benchmark, reproducible with Godbolt. What a tool!

zkash أعاد

Alex L Zhang

@a1zhang

٣٠ سبتمبرم

it's insane to me how little attention the llm.q repo has it's a fully C/C++/CUDA implementation of multi-gpu (zero + fsdp), quantized LLM training with support for selective AC it's genuinely the coolest OSS thing I've seen this year (what's crazier is 1 person wrote it!)

a1zhang's tweet image. it's insane to me how little attention the llm.q repo has

it's a fully C/C++/CUDA implementation of multi-gpu (zero + fsdp), quantized LLM training with support for selective AC

it's genuinely the coolest OSS thing I've seen this year (what's crazier is 1 person wrote it!)

zkash أعاد

Charles 🎉 Frye

@charles_irl

٢٦ سبتمبرم

We reverse-engineered Flash Attention 4.

zkash

@asyncakash

٢٥ سبتمبرم

Really enjoyed @samsja19’s talk on the challenges of decentralized training (e.g. DiLoCo) under low-bandwidth conditions. Was surprised to learn how much weather can destabilize training 🤯 @PrimeIntellect is doing some wild stuff with decentralized RL! 🚀 Thanks for the…

zkash أعاد

Ben Burtenshaw

@ben_burtenshaw

٢٤ سبتمبرم

too much new learning material! we're releasing a few chapters of hard study on post training AI models. it covers all major aspects plus more to come. - Evaluating Large Language models on benchmarks and custom use cases - Preference Alignment with DPO - Fine tuning Vision…

ben_burtenshaw's tweet image. too much new learning material! we're releasing a few chapters of hard study on post training AI models. it covers all major aspects plus more to come.

- Evaluating Large Language models on benchmarks and custom use cases
- Preference Alignment with DPO
- Fine tuning Vision…

zkash أعاد

Alex L Zhang

@a1zhang

١٣ سبتمبرم

hi! if you’re interested in using or writing mega kernels for AI (one big GPU kernel for an entire model) you should tune in to today’s @GPU_MODE livestream today in ~3 hours we have the authors of MPK talking about their awesome new compiler for mega kernels! see you there :)

a1zhang's tweet image. hi! if you’re interested in using or writing mega kernels for AI (one big GPU kernel for an entire model) you should tune in to today’s @GPU_MODE livestream

today in ~3 hours we have the authors of MPK talking about their awesome new compiler for mega kernels!

see you there :)

zkash أعاد

JingyuanLiu

@JingyuanLiu123

١٣ سبتمبرم

I was lucky to work in both China and the US LLM labs, and I've been thinking this for a while. The current values of pretraining are indeed different: US labs be like: - lots of GPUs and much larger flops run - Treating stabilities more seriously, and could not tolerate spikes…

Charuru Charuru

@CharuruCha14310

١٣ سبتمبرم

I bet OpenAI/xAI is laughing so hard, this result is obvious tbh, they took a permanent architectural debuff in order to save on compute costs.

zkash

@asyncakash

١٢ سبتمبرم

Qwen is basically the Samsung (smartphone) of llms. They ship nice new models everything month.

Ahmad

@TheAhmadOsman

١٢ سبتمبرم

China saved opensource LLMs, some notable releases from July only > Kimi K2 > Qwen3 235B-A22B-2507 > Qwen3 Coder 480B-A35B > Qwen3 235B-A22B-Thinking-2507 > GLM-4.5 > GLM-4.5 Air > Qwen3 30B-A3B-2507 > Qwen3 30B-A3B-Thinking-2507 > Qwen3 Coder 30B-A3B US & EU need to do better

zkash

@asyncakash

١٢ سبتمبرم

imagine trying to “learn to code” in cursor when the tab key is basically god mode 💀

Cursor

@cursor_ai

١١ سبتمبرم

We've trained a new Tab model that is now the default in Cursor. This model makes 21% fewer suggestions than the previous model while having a 28% higher accept rate for the suggestions it makes. Learn more about how we improved Tab with online RL.

zkash

@asyncakash

١١ سبتمبرم

ai bros really out here teaching each other how to draw assholes 😭

zkash

@asyncakash

١١ سبتمبرم

chinese ai labs slaying it 🔥

Qwen

@Alibaba_Qwen

١١ سبتمبرم

🚀 Introducing Qwen3-Next-80B-A3B — the FUTURE of efficient LLMs is here! 🔹 80B params, but only 3B activated per token → 10x cheaper training, 10x faster inference than Qwen3-32B.(esp. @ 32K+ context!) 🔹Hybrid Architecture: Gated DeltaNet + Gated Attention → best of speed &…

Alibaba_Qwen's tweet image. 🚀 Introducing Qwen3-Next-80B-A3B — the FUTURE of efficient LLMs is here!

🔹 80B params, but only 3B activated per token → 10x cheaper training, 10x faster inference than Qwen3-32B.(esp. @ 32K+ context!)
🔹Hybrid Architecture: Gated DeltaNet + Gated Attention → best of speed &amp;…

zkash

@asyncakash

١١ سبتمبرم

Just had the most amazing Transformers (with flash attention) lecture from @danielhanchen — he broke down the guts of Transformers and walked us through the full backprop step-by-step, all by hand. Huge thanks to @TheZachMueller for organizing!

zkash أعاد

Elliot Arledge (h/eng)

@elliotarledge

١٠ سبتمبرم

DO NOT buy a gpu to write kernels. use @modal notebooks. take 2 mins out of your day to learn this simple trick and kick off your work without paying a shit ton for electricity or cloud gpu run 24/7

zkash أعاد

Aditya Sharma

@Adityaaa_Sharma

٨ سبتمبرم

🚨 career update i’ve joined @bulletxyz_ to build the growth engine driving the next million on-chain traders. excited to build a @solana native trading layer that brings CEX performance fully on-chain. more ↓