emi

@gpuemi

co-founder @wafer_ai (yc s25)

san francisco, ca

wafer.ai/manifesto

Inscrit en Décembre 2015

2KPosts 1KAbonnés 2KAbonnements

Vous pourriez aimer

@dvelaazquez

@pamelaagalindo

@aryjivi

@montsejmzg

Épinglé

emi

@gpuemi

20 déc.

(1/8) we’re launching the wafer vscode / cursor extension to help you develop, profile, and optimize gpu kernels as efficiently as possible would love feedback from ppl writing cuda / cutlass/cute / training + inference perf folks links to install below or at wafer dot ai

emi a reposté

steve

@gpusteve

23 janv.

which models are the most hip?🪩 hip is amd's cuda alternative for gpu kernel programming, and sits at the center of amd's ml stack. we ran a model head-to-head comparison in kernel generation using hip to understand how today's frontier models perform.

gpusteve's tweet image. which models are the most hip?🪩

hip is amd's cuda alternative for gpu kernel programming, and sits at the center of amd's ml stack.

we ran a model head-to-head comparison in kernel generation using hip to understand how today's frontier models perform.

emi a reposté

steve

@gpusteve

21 janv.

if you're interested in perf engineering, we're looking for an intern/contractor GPU perf engineer to use Wafer's tools on real open-source projects (vLLM, SGLang, etc.) you will optimize open source AI infra with Wafer, and write up what things worked, what didn't, and why.…

emi a reposté

Elliot Arledge

@elliotarledge

6 janv.

This is my favorite clip of the new Elon pod. He opens up saying xAI struggles with memory usage/bandwidth and CUDA kernel optimization (matmul, attention, MoE, etc). If you are good kernel or performance engineering in general, you should apply. Steer the world in a better…

emi a reposté

ishan

@0xishand

28 déc.

Performant KV$ onboarding/offloading from all memory tiers is an insanely difficult systems problem. If you get right, the unlocks are enormous. If work like this interests you - take a look at KVBlockManger (link in next tweet). It’s OSS + we’re hiring for it.

Zephyr

@zephyr_z9

28 déc.

Storing the humongous KV cache generated by reasoning models (1M+ Context length) on SRAM will never work In fact, Nvidia will likely offload this KV cache to high-speed SSDs with 100M IOPS For example, DeepSeek V3 requires 34.3 KB of storage per token. At 100k Context Length and…

zephyr_z9's tweet image. Storing the humongous KV cache generated by reasoning models (1M+ Context length) on SRAM will never work
In fact, Nvidia will likely offload this KV cache to high-speed SSDs with 100M IOPS
For example, DeepSeek V3 requires 34.3 KB of storage per token. At 100k Context Length and…

emi a reposté

Alfredo Andere

@AlfredoAndere

26 déc.

Made a website to easily visualize the results (which the team worked really hard on): benchmarks.bio We believe that making LLMs better at biological data analysis is one of the best ways to make them more useful to scientists. We will continue to expand to more types…

AlfredoAndere's tweet image. Made a website to easily visualize the results (which the team worked really hard on): benchmarks.bio

We believe that making LLMs better at biological data analysis is one of the best ways to make them more useful to scientists.

We will continue to expand to more types…

Kenny Workman

@kenbwork

26 déc.

2026 will be the year of agents in biology. But we need better benchmarks. We worked with scientists to turn real world analysis into verifiable problems. SpatialBench stratifies frontier models, shows harnesses matter, and reveals distinct failure modes between model families:

kenbwork's tweet image. 2026 will be the year of agents in biology. But we need better benchmarks.

We worked with scientists to turn real world analysis into verifiable problems. SpatialBench stratifies frontier models, shows harnesses matter, and reveals distinct failure modes between model families:

emi a reposté

Elliot Arledge

@elliotarledge

21 déc.

this is nuts

emi

@gpuemi

20 déc.

emi

@gpuemi

24 nov.

is there a tool to run 10+ claude codes on my repo at the same time?

emi a reposté

steve

@gpusteve

19 nov.

avg kernel blog vs avg kernel programming experience

emi a reposté

Michael Truell

@mntruell

12 nov.

After adopting Cursor, businesses merge ~40% more PRs each week. New economics research from the University of Chicago.

𝐒𝐮𝐩𝐫𝐨𝐭𝐞𝐞𝐦 𝐒𝐚𝐫𝐤𝐚𝐫

@SuproteemSarkar

10 nov.

Who uses AI agents? How do agents impact output? How might agents change work patterns? New working paper studies usage + impacts of coding agents (1/n)