aditjain1980's profile picture. PhD Candidate @ Cornell ECE. Interested in Machine Learning and Reinforcement Learning.

Adit Jain

@aditjain1980

PhD Candidate @ Cornell ECE. Interested in Machine Learning and Reinforcement Learning.

置頂

1/ Chain of thought reasoning can be significantly improved using RLVR but can we improve the generation process for reasoning tokens during training for better exploration, efficiency and performance? @brendanh0gan and I explore this question in our recent work 🧵 (tldr: yes!)

aditjain1980's tweet image. 1/ Chain of thought reasoning can be significantly improved using RLVR but can we improve the generation process for reasoning tokens during training for better exploration, efficiency and performance?  @brendanh0gan and I explore this question in our recent work 🧵
(tldr: yes!)

Sign up as a reviewer or AE if you can! I have been a reviewer for TMLR for almost 2 years now and it has been a greatly positive learning experience.

As Transactions on Machine Learning Research (TMLR) grows in number of submissions, we are looking for more reviewers and action editors. Please sign up! Only one paper to review at a time and <= 6 per year, reviewers report greater satisfaction than reviewing for conferences!

TmlrOrg's tweet image. As Transactions on Machine Learning Research (TMLR) grows in number of submissions, we are looking for more reviewers and action editors. Please sign up! 

Only one paper to review at a time and &amp;lt;= 6 per year, reviewers report greater satisfaction than reviewing for conferences!


Very cool work!

We’re excited to introduce ShinkaEvolve: An open-source framework that evolves programs for scientific discovery with unprecedented sample-efficiency. Blog: sakana.ai/shinka-evolve/ Code: github.com/SakanaAI/Shink… Like AlphaEvolve and its variants, our framework leverages LLMs to…



Adit Jain 已轉發

My acceptance speech at the Turing award ceremony: Good evening ladies and gentlemen. The main idea of reinforcement learning is that a machine might discover what to do on its own, without being told, from its own experience, by trial and error. As far as I know, the first…


Gemma-3 270M has interesting collapse behavior - it uses words from different indian languages - Hindi, Tamil, Marathi, Bangla (the task is english based) - perhaps a multi-lingual pretraining quirk?

aditjain1980's tweet image. Gemma-3 270M has interesting collapse behavior - it uses words from different indian languages - Hindi, Tamil, Marathi, Bangla (the task is english based) - perhaps a multi-lingual pretraining quirk?

Adit Jain 已轉發

introducing qqWen: our fully open-sourced project (code+weights+data+detailed technical report) for full-stack finetuning (pretrain+SFT+RL) a series of models (1.5b, 3b, 7b, 14b & 32b) for a niche financial programming language called Q All details below!

brendanh0gan's tweet image. introducing qqWen: our fully open-sourced project (code+weights+data+detailed technical report) for full-stack finetuning (pretrain+SFT+RL) a series of models (1.5b, 3b, 7b, 14b &amp;amp; 32b) for a niche financial programming language called Q

All details below!
brendanh0gan's tweet image. introducing qqWen: our fully open-sourced project (code+weights+data+detailed technical report) for full-stack finetuning (pretrain+SFT+RL) a series of models (1.5b, 3b, 7b, 14b &amp;amp; 32b) for a niche financial programming language called Q

All details below!

the em-dashes live on

aditjain1980's tweet image. the em-dashes live on

United States 趨勢

Loading...

Something went wrong.


Something went wrong.