Saurabh Garg

@saurabh_garg67

@thinkymachines | prev/ Researcher @MistralAI; PhD @mldcmu; CS @iitbombay (undergrad); Collab @GoogleAI @awscloud @apple

saurabhgarg1996.github.io

Joined December 2014

251Posts 2KFollowers 645Following

You might like

@zicokolter

@JacobSteinhardt

@_mingjiesun

@jefrankle

@PangWeiKoh

@pliang279

@bneyshabur

@ElanRosenfeld

@ClementineDomi6

@risteski_a

@jainprateek_

@ananyaku

@yidingjiang

@_sam_sinha_

@BingbinL

Saurabh Garg reposted

Thinking Machines

@thinkymachines

Oct 29

Today we’re announcing research and teaching grants for Tinker: credits for scholars and students to fine-tune and experiment with open-weight LLMs. Read more and apply at: thinkingmachines.ai/blog/tinker-re…

Saurabh Garg reposted

Thinking Machines

@thinkymachines

Oct 27

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other…

thinkymachines's tweet image. Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other…

Saurabh Garg reposted

Andrej Karpathy

@karpathy

Oct 1

Tinker is cool. If you're a researcher/developer, tinker dramatically simplifies LLM post-training. You retain 90% of algorithmic creative control (usually related to data, loss function, the algorithm) while tinker handles the hard parts that you usually want to touch much less…

Thinking Machines

@thinkymachines

Oct 1

Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models!…

thinkymachines's tweet image. Introducing Tinker: a flexible API for fine-tuning language models.

Write training loops in Python on your laptop; we'll run them on distributed GPUs.

Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models!…

Saurabh Garg reposted

Lilian Weng

@lilianweng

Oct 1

GPUs are expensive and setting up the infrastructure to make GPUs work for you properly is complex, making experimentation on cutting-edge models challenging for researchers and ML practitioners. Providing high quality research tooling is one of the most effective ways to…

lilianweng's tweet image. GPUs are expensive and setting up the infrastructure to make GPUs work for you properly is complex, making experimentation on cutting-edge models challenging for researchers and ML practitioners.

Providing high quality research tooling is one of the most effective ways to…

Saurabh Garg reposted

Horace He

@cHHillee

Oct 1

One interesting "fundamental" reason for Tinker today is the rise of MoE. Whereas hackers used to deploy llama3-70B efficiently on one node, modern deployments of MoE models require large multinode deployments for efficiency. The underlying reason? Arithmetic intensity. (1/5)

cHHillee's tweet image. One interesting "fundamental" reason for Tinker today is the rise of MoE. Whereas hackers used to deploy llama3-70B efficiently on one node, modern deployments of MoE models require large multinode deployments for efficiency.

The underlying reason? Arithmetic intensity.
(1/5)

Thinking Machines

@thinkymachines

Oct 1

Saurabh Garg reposted

Thinking Machines

@thinkymachines

Oct 1

Saurabh Garg reposted

Thinking Machines

@thinkymachines

Sep 29

LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.…

thinkymachines's tweet image. LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.…

Saurabh Garg reposted

Thinking Machines

@thinkymachines

Sep 26

Efficient training of neural networks is difficult. Our second Connectionism post introduces Modular Manifolds, a theoretical step toward more stable and performant training by co-designing neural net optimizers with manifold constraints on weight matrices.…

thinkymachines's tweet image. Efficient training of neural networks is difficult. Our second Connectionism post introduces Modular Manifolds, a theoretical step toward more stable and performant training by co-designing neural net optimizers with manifold constraints on weight matrices.…

Saurabh Garg reposted

Suhas Kotha

@kothasuhas

Sep 19

Since compute grows faster than the web, we think the future of pre-training lies in the algorithms that will best leverage ♾ compute We find simple recipes that improve the asymptote of compute scaling laws to be 5x data efficient, offering better perf w/ sufficient compute

kothasuhas's tweet image. Since compute grows faster than the web, we think the future of pre-training lies in the algorithms that will best leverage ♾ compute

We find simple recipes that improve the asymptote of compute scaling laws to be 5x data efficient, offering better perf w/ sufficient compute

Saurabh Garg reposted

Thinking Machines

@thinkymachines

Sep 10

Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to…

thinkymachines's tweet image. Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference”

We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to…

Saurabh Garg

@saurabh_garg67

Jul 15

Really excited about our focus on building multimodal AI that collaborates with humans the way humans collaborate with each other. It's been an amazing ~4 months building with a small, talented team. Come join us!

Mira Murati

@miramurati

Jul 15

Thinking Machines Lab exists to empower humanity through advancing collaborative general intelligence. We're building multimodal AI that works with how you naturally interact with the world - through conversation, through sight, through the messy way we collaborate. We're…

Saurabh Garg reposted

Alexander Kirillov

@_alex_kirillov_

Jul 15

We have been working hard for the past 6 months on what I believe is the most ambitious multimodal AI program in the world. It is fantastic to see how pieces of a system that previously seemed intractable just fall into place. Feeling so lucky to create the future with this…

Mira Murati

@miramurati

Jul 15

Saurabh Garg reposted

Rowan Zellers

@rown

Jul 15

It’s really fun to work with a talented yet small team. Our mission is ambitious - multimodal AI for collaborating with humans, so the best is yet to come! Join us— or fill out the application below if interested!

Mira Murati

@miramurati

Jul 15

Saurabh Garg reposted

Mira Murati

@miramurati

Jul 15

Saurabh Garg reposted

Sukjun (June) Hwang

@sukjun_hwang

Jul 11

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data

Saurabh Garg reposted

Fartash Faghri

@FartashFg

Jul 7

Is your AI keeping Up with the world? Announcing #NeurIPS2025 CCFM Workshop: Continual and Compatible Foundation Model Updates When/Where: Dec. 6-7 San Diego Submission deadline: Aug. 22, 2025. (opening soon!) sites.google.com/view/ccfm-neur… #FoundationModels #ContinualLearning

Saurabh Garg reposted

Ludwig Schmidt

@lschmidt3

Jun 5

Very excited to finally release our paper for OpenThoughts! After DataComp and DCLM, this is the third large open dataset my group has been building in collaboration with the DataComp community. This time, the focus is on post-training, specifically reasoning data.

lschmidt3's tweet image. Very excited to finally release our paper for OpenThoughts!

After DataComp and DCLM, this is the third large open dataset my group has been building in collaboration with the DataComp community. This time, the focus is on post-training, specifically reasoning data.

Saurabh Garg reposted

Lilian Weng

@lilianweng

May 17

Giving your models more time to think before prediction, like via smart decoding, chain-of-thoughts reasoning, latent thoughts, etc, turns out to be quite effective for unblocking the next level of intelligence. New post is here :) “Why we think”: lilianweng.github.io/posts/2025-05-…

Saurabh Garg reposted

Palash Kala

@kalapolish

May 4

Was fun hosting informal IITB CS get together in SF. We still argue about which hostel is the best 🙃 The oldest person was born in 1992 and the youngest was a decade younger

Saurabh Garg reposted

Mistral AI

@MistralAI

Mar 6

Introducing the world's best OCR model! mistral.ai/news/mistral-o…

MistralAI's tweet card. Introducing the world’s best document understanding API.

Mistral OCR | Mistral AI

Source: mistral.ai

Zachary Lipton

@zacharylipton

Danish Pruthi

@danish037

Behnam Neyshabur

@bneyshabur

Divyansh Kaushik

@dkaushik96

Jeremy Cohen

@deepcohen

Pratyush Maini

@pratyushmaini

Ananya Kumar

@ananyaku

Christina Baek

@_christinabaek

Yiding Jiang

@yidingjiang

$sarahookr's profile picture. Adaptive Intelligence. Built @Cohere_Labs, @GoogleBrain, @GoogleDeepmind. ML Efficiency, Multimodal\lingual. Changing spaces where breakthroughs happen.$