theharshithh's profile picture. parallel tensor splitter @simplismartHQ, prev @onfinance_ai

harshith

@theharshithh

parallel tensor splitter @simplismartHQ, prev @onfinance_ai

مثبتة

had just shipped a repo for training speculative decoding heads to speed up inference of llms by ~3x. get any base model, train a few speculative heads, see the difference in throughput. 🧵on more details. 1/n

theharshithh's tweet image. had just shipped a repo for training speculative decoding heads to speed up inference of llms by ~3x. 

get any base model, train a few speculative heads, see the difference in throughput. 

🧵on more details. 

1/n

how does cloudflare handle ~almost all cdn traffic and still shoot itself in the leg multiple times over. first react useEffect, now sql filtering.

cloudflare outage was due to one bad SQL statement that baked in an assumption it shouldnt have can you spot the bug here? no. because SQL does not Make Wrong Code Look Wrong. sometimes i wonder how many SEVs, performance issues and privacy leaks happen because we took a query…

swyx's tweet image. cloudflare outage was due to one bad SQL statement that baked in an assumption it shouldnt have

can you spot the bug here? no. because SQL does not Make Wrong Code Look Wrong.

sometimes i wonder how many SEVs, performance issues and privacy leaks happen because we took a query…


colab in vs-code/cursor should be that 100x outcome feature.


please take this adivice from me and do not waste weeks on @LiteLLM. it JUST DOESNT WORK. 1. core repo is 200mb, where 180mb is images and gifs. 2. we had to contribute to our forks and install to get it working. how is so hard to write python transformation fns. just write…


instagram's codebase is ~20mn lines of python. do what u will with information js devs, im sorry. python is really all you need.

theharshithh's tweet image. instagram's codebase is ~20mn lines of python. do what u will with information

js devs, im sorry. 
python is really all you need.

harshith أعاد

AI has been built on one vendor’s stack for too long. AMD’s GPUs now offer state-of-the-art peak compute and memory bandwidth — but the lack of mature software / the “CUDA moat” keeps that power locked away. Time to break it and ride into our multi-silicon future. 🌊 It's been a…

simran_s_arora's tweet image. AI has been built on one vendor’s stack for too long.
AMD’s GPUs now offer state-of-the-art peak compute and memory bandwidth — but the lack of mature software / the “CUDA moat” keeps that power locked away. Time to break it and ride into our multi-silicon future. 🌊

It's been a…

scaling rl is an infra problem than a ml problem. apart from the fact that training is just waiting for rollouts to be completed and updating.

theharshithh's tweet image. scaling rl is an infra problem than a ml problem. apart from the fact that training is just waiting for rollouts to be completed and updating.

and ladies and gentlemen, its all compute and data. cheetah was a previous checkpoint of composer. build your platform, collect data, train and win!

theharshithh's tweet image. and ladies and gentlemen,  its all compute and data.

cheetah was a previous checkpoint of composer. build your platform, collect data, train and win!


and ladies and gentlemen, its all compute and data. cheetah was a previous checkpoint of composer. build your platform, collect data, train and win!

theharshithh's tweet image. and ladies and gentlemen,  its all compute and data.

cheetah was a previous checkpoint of composer. build your platform, collect data, train and win!

testing something here. ignore

Makeup ate today



here is something I worked on a few months before, you can train a few speculative decoding heads and infer out of that. end to end setup. can be modified to train for eagle spec heads. leave a star if found helpful! github.com/theharshith/sp…

Here's your weekend challenge: Implement speculative decoding. Step 1: Read the following paper and/or blog: arxiv.org/abs/2211.17192 galacodes.hashnode.dev/speculative-de… (cc @jaygala223) Step 2: Choose a family of models which come in various sizes. My choice would be the Gemma3 or Qwen…

prajdabre's tweet image. Here's your weekend challenge: Implement speculative decoding.

Step 1: Read the following paper and/or blog: arxiv.org/abs/2211.17192 galacodes.hashnode.dev/speculative-de… (cc @jaygala223)
Step 2: Choose a family of models which come in various sizes. My choice would be the Gemma3 or Qwen…


bro is pitching the competitor of loom to the founder of loom lmao

theharshithh's tweet image. bro is pitching the competitor of loom to the founder of loom lmao

harshith أعاد

John Wick of CUDA kernels.

of course merged by the 500IQ Tsinghua GOAT himself

scaling01's tweet image. of course merged by the 500IQ Tsinghua GOAT himself


“win the internet for a day” as a service the more i think about this, the more it makes sense. so much value unlock for everyone in general.

theharshithh's tweet image. “win the internet for a day” as a service

the more i think about this, the more it makes sense. so much value unlock for everyone in general.

i wonder how they plan moving data in and data out, latest nasa's latest number is 900mbps. assuming it would be for large scale pretraining, they would launch "space jobs". not sure if this is helpful for realtime inference. exciting time to be alive

Our TPUs are headed to space!  Inspired by our history of moonshots, from quantum computing to autonomous driving, Project Suncatcher is exploring how we could one day build scalable ML compute systems in space, harnessing more of the sun’s power (which emits more power than 100…

sundarpichai's tweet image. Our TPUs are headed to space! 

Inspired by our history of moonshots, from quantum computing to autonomous driving, Project Suncatcher is exploring how we could one day build scalable ML compute systems in space, harnessing more of the sun’s power (which emits more power than 100…


"While you’re crafting the perfect launch tweet, support tickets pile up. While you’re updating your bio with “Backed by [Famous Fund],” customers churn. While you’re performing success, someone boring is building" while I sort of agree of what @nikunj is telling here but also,…


no we will take care of inference, you just release those weights

"Open-sourcing does not equal free; running inference servers comes with costs." U need to buy a computer to run Linux too "Open-sourcing the weights of large models is different from open-source software; there is no reverse contribution from the developer community."…



harshith أعاد

Tell me you don't know about AI infra without telling me you don't know about AI infra.

The future of AI engineering is TypeScript, not Python.



NCCL is pronounced as "nickel"???? are we for real?


United States الاتجاهات

Loading...

Something went wrong.


Something went wrong.