Basavasagar Patil

@basavasagar18

@UMRobotics Master's Student.

Ann Arbor

basavasagarkp.github.io

Joined November 2019

502Posts 105Followers 626Following

You might like

@MahtabSarvmaili

@jprichardsonNNL

@RohanBaijal

@pecey01

@worasuchad

@firasalhafez

@jashokkumar83

@DuyHMNguyen1

@PsNotNP

@roy_aviral

@msrai007_rai

@jishnu_jaykumar

@sjwcee

Basavasagar Patil reposted

John Carmack

@ID_AA_Carmack

Sep 16

Making everything one big CUDA graph helped wall clock consistency a lot on a laptop, but we still fought with power management. Looking forward to using an Nvidia Spark in the future. It isn’t in the repo code, but the single biggest win I have seen is an “action input” model…

Basavasagar Patil

@basavasagar18

Sep 8

open.spotify.com/album/4ZH6HND1…

Basavasagar Patil

@basavasagar18

Aug 22

With some caveats*

Siddharth Ancha

@siddancha

Aug 21

Do VLAs really need large multi-billion parameter VLM backbones? Recent impressive work from TRI and BD seems to suggest maybe not.

Basavasagar Patil

@basavasagar18

Aug 20

Solution to Yann lecun's drifting problem is a system prompt!!

Vision Transformers

@vitransformer

Aug 20

New blog post by @AmanGokrani: Everyone says Claude Code "just works" like magic. He proxied its API calls to see what's happening. The secret? It's riddled with <system-reminder> tags that never let it forget what it's doing. (1/6) [🔗 link in final post with system prompt]

vitransformer's tweet image. New blog post by @AmanGokrani:
Everyone says Claude Code "just works" like magic.

He proxied its API calls to see what's happening.
The secret? It's riddled with &lt;system-reminder&gt; tags that never let it forget what it's doing.

(1/6)

[🔗 link in final post with system prompt]

Basavasagar Patil reposted

Clare Lyle

@clarelyle

Aug 10

PSA: if you work on plasticity loss you should read "Transient Non-stationarity and Generalisation in Deep Reinforcement Learning" by Igl et al. It's super relevant but suffers from an unfortunate lack of SEO due to predating the "plasticity loss" nomenclature.

Basavasagar Patil

@basavasagar18

Aug 5

ofcourse, they took the spotlight from genie

Basavasagar Patil

@basavasagar18

Aug 5

The progress is crazy on this!!

Tim Rocktäschel

@_rockt

Aug 5

Harder, Better, Faster, Stronger, Real-time! We are excited to reveal Genie 3, our most capable real-time foundational world model. Fantastic cross-team effort led by @jparkerholder and @shlomifruchter. Below some interactive worlds and capabilities that were highlights for me…

Basavasagar Patil

@basavasagar18

Aug 1

Lol, I just used Opus to rewrite my torch PPO code to Jax. Let's just say I was nowhere near confident, and I was right, full of compilation errors and bugs

Sauers

@Sauers_

Jul 31

"We recently merged a 22,000-line change to our production reinforcement learning codebase that was written heavily by Claude."

Sauers_'s tweet image. "We recently merged a 22,000-line change to our production reinforcement learning codebase that was written heavily by Claude."

Basavasagar Patil reposted

Travis Hubbard

@wtravishubbard

Jul 31

Sell my computer

dennis hegstad

@dennishegstad

Jul 31

You’re given a MacBook, no job, no money. You have 30 days to make $1,000 online. What’s your plan?

Basavasagar Patil reposted

Alex Dimakis

@AlexGDimakis

Jul 29

I am excited to announce that our AI institute (Institute for Foundations of Machine Learning, IFML) has been renewed. IFML was part of the first cohort of AI Institutes announced in 2020. Led by UT Austin, the new award will build on the trajectory of the past five years and…

AlexGDimakis's tweet image. I am excited to announce that our AI institute (Institute for Foundations of Machine Learning, IFML) has been renewed.
IFML was part of the first cohort of AI Institutes announced in 2020. Led by UT Austin, the new award will build on the trajectory of the past five years and…

Basavasagar Patil reposted

Vlado Boza

@bozavlado

Jul 14

This very cool paper proposes an intriguing idea. If you use a small batch size, you can fine-tune LLMs with SGD or Adafactor (algorithms with very small memory overhead). But there is a small trap: Storage precision. Let's explore that. 🧵

Micah Goldblum

@micahgoldblum

Jul 10

🚨 Did you know that small-batch vanilla SGD without momentum (i.e. the first optimizer you learn about in intro ML) is virtually as fast as AdamW for LLM pretraining on a per-FLOP basis? 📜 1/n

micahgoldblum's tweet image. 🚨 Did you know that small-batch vanilla SGD without momentum (i.e. the first optimizer you learn about in intro ML) is virtually as fast as AdamW for LLM pretraining on a per-FLOP basis? 📜 1/n

Basavasagar Patil

@basavasagar18

Jul 13

Hierarchical RL!!!!

Andrej Karpathy

@karpathy

Jul 13

Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly…

Basavasagar Patil reposted

Arthur Douillard

@Ar_Douillard

Jul 11

clickbait title

Arthur Douillard

@Ar_Douillard

Jul 4

I'll discuss distributed learning on Saturday, July 12. First, I'll cover current methods needing high bandwidth, then next-generation methods for decentralized learning

Basavasagar Patil reposted

tender

@tenderizzation

Jul 11

before we got fancy optimizers like muon shampoo and conditioner with matmuls , optimizer steps were essentially entirely bandwidth-bound pointwise ops in this regime more optimizer steps by definition decreases your MFU, and vice-versa if the speed increases more than your…

finbarr

@finbarrtimbers

Jul 11

I never understood gradient accumulation; it seemed to me that doing more optimizer steps was clearly optimal, as you can mimic the accumulation behaviour with a properly chosen learning rate

Basavasagar Patil reposted

Preetum Nakkiran

@PreetumNakkiran

May 5

(only tangentially related): Many interesting claims in empirical DL, including my own, are “not even wrong” — they are not stated precisely enough to even be hypothesis-tested. This is often by necessity: we don’t know formal definitions of the relevant objects. (Cont)

Omar Khattab

@lateinteraction

May 4

Grad students often feel pressure that doing science means only changing a-variable-at-a-time. No. You do that if you're testing a causal hypothesis. If you're exploring complex systems, you may maintain a mental model and update several variables at once in, eg, a Bayesian way.

Basavasagar Patil reposted

Russ Tedrake

@RussTedrake

Jul 9

TRI's latest Large Behavior Model (LBM) paper landed on arxiv last night! Check out our project website: toyotaresearchinstitute.github.io/lbm1/ One of our main goals for this paper was to put out a very careful and thorough study on the topic to help people understand the state of the…

Basavasagar Patil reposted

Kevin Lu

@_kevinlu

Jul 9

Why you should stop working on RL research and instead work on product // The technology that unlocked the big scaling shift in AI is the internet, not transformers I think it's well known that data is the most important thing in AI, and also that researchers choose not to work…

_kevinlu's tweet image. Why you should stop working on RL research and instead work on product //
The technology that unlocked the big scaling shift in AI is the internet, not transformers

I think it's well known that data is the most important thing in AI, and also that researchers choose not to work…

Basavasagar Patil reposted

AriZona Iced Tea

@DrinkAriZona

Jul 8

we are from New York

Moon Dragon

@frozenaesthetic

Apr 1

Share a piece of lore about yourself

Ariadne

@CullenNitz85763

Marta

@Jordo349970

PrudenceSweet

@30oi49RAokLx77G

Julia

@37G7Siz4mgilowT

Avinash Pathy

@avinashp0102

MysticSophiaBaker

@Qigaq210

Johan Obando-Ceron 👍🏽

@johanobandoc

Lechao Xiao

@Locchiu

Micah Goldblum

@micahgoldblum

Satar

@SatarXkt7O

Ricardo Buitrago

@rbuit_

Unnat Jain

@unnatjain2010

Preston Culbertson

@pdculbert

Paper Copilot @ ICML 2025

@papercopilot

PolyNavigator

@HammesGavi70580

leonardo bertelli

@leo__berte

🥘🍚

@Neeufic123

Kyle Sargent

@KyleSargentAI

Dough

@DoughVBKjNt5

Fouppea

@FouppeaVsG

Dieter Büchler

@dtrbchlr

WinniGilbert

@nsYsti4A0AtCDpc

Andreas Kirsch 🇺🇦

@BlackHC

Someone

@Some1_Boring

Sossee

@SosseeZsKkN

Xiong Zeng

@XiongZeng111

Wenda Wang

@wwd7086

Vandal

@nwl989rt3KNz4V8

Tounioez

@TounioezQMXVt

MarthaViolet

@s77JbjX0FDC4PG

Roman Erkens

@ErkensRoman

Rohan Sikand

@rosikand

Seong Joon Oh

@coallaoh

Nikhil Devraj

@NikDevraj

Tiffaney

@chamberstiffane

Rachel

@Rachel289493913

AI & CyberSecurity

@AInCyberSec

Bot Hunter

@mars_2029

Amine Kharrat

@dAmineKharrat

elie

@eliebakouch

lil’km

@_lilkm_

Remi Cadene

@RemiCadene

JennyDennis

@WxTSx7hIxQT9ygE

Haoxiang You

@haoxiang_you

Kyle かい

@Kyle9jw

Aneesh Muppidi

@aneeshers

Zohre karimi

@KarimiZohre

Bute AI Cryptocurrency Team

@8uWG34n5hXVgMVh

Kamyar Ghasemipour

@coolboi95

$sarahookr's profile picture. Adaptive Intelligence. Built @Cohere_Labs, @GoogleBrain, @GoogleDeepmind. ML Efficiency, Multimodal\lingual. Changing spaces where breakthroughs happen.$