Shashank Rajput

@shashank_r12

LLM Research @Meta

shashankrajput.github.io

Joined October 2013

222Posts 845Followers 679Following

You might like

@KartikSreeni

@DimitrisPapail

@Kangwook_Lee

@konstmish

@bremen79

@denny_zhou

@rdnowak

@harit_v

@HongyiWang10

@WillettBecca

@DBertsekas

@AryanMokhtari

@kwangsungjun

@MountainOfMoon

@dipanjand

Shashank Rajput reposted

Dimitris Papailiopoulos

@DimitrisPapail

Nov 12

My student @AngelikiGiannou is on the job market and she's the one that wrote the OG Looped Transformer paper. I strongly encourage you to read it, it's a tour de force! While I won't claim it influenced work on test-time compute, it 100% anticipated directions the community is…

Dimitris Papailiopoulos

@DimitrisPapail

Feb 1, 2023

Can transformers follow instructions? We explore this in: "Looped Transformers as Programmable Computers" arxiv.org/abs/2301.13196 led by Angeliki (@AngelikiGiannou) and Shashank (@shashank_r12) in collaboartion with the @Kangwook_Lee and @jasondeanlee Here is a 🧵

DimitrisPapail's tweet image. Can transformers follow instructions? We explore this in:

"Looped Transformers as Programmable Computers"
arxiv.org/abs/2301.13196

led by Angeliki (@AngelikiGiannou) and Shashank (@shashank_r12) in collaboartion with the @Kangwook_Lee and @jasondeanlee

Here is a 🧵

Shashank Rajput reposted

finbarr

@finbarrtimbers

Nov 9

The MixAttention blog post from Databricks/Mosaic is great: databricks.com/blog/mixattent…

finbarrtimbers's tweet image. The MixAttention blog post from Databricks/Mosaic is great:

databricks.com/blog/mixattent…

Shashank Rajput reposted

Abhay Gupta

Sep 18

@jefrankle frantically asks not to whisper the words that won him this very well deserved award !!! But here’s one for you - “Lottery Tickets” !!!

Jonathan Frankle

Sep 18

Throwback Thursday. csail.mit.edu/news/jonathan-…

Shashank Rajput reposted

Pratyush Maini

Aug 18

1/Pretraining is hitting a data wall; scaling raw web data alone leads to diminishing returns. Today @datologyai shares BeyondWeb, our synthetic data approach & all the learnings from scaling it to trillions of tokens🧑🏼‍🍳 - 3B LLMs beat 8B models🚀 - Pareto frontier for performance

pratyushmaini's tweet image. 1/Pretraining is hitting a data wall; scaling raw web data alone leads to diminishing returns. Today @datologyai shares BeyondWeb, our synthetic data approach &amp; all the learnings from scaling it to trillions of tokens🧑🏼‍🍳
- 3B LLMs beat 8B models🚀
- Pareto frontier for performance

Shashank Rajput reposted

DeepSeek

Feb 18

🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference! Core components of NSA: • Dynamic hierarchical sparse strategy • Coarse-grained token compression • Fine-grained token selection 💡 With…

deepseek_ai's tweet image. 🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training &amp; inference!

Core components of NSA:
• Dynamic hierarchical sparse strategy
• Coarse-grained token compression
• Fine-grained token selection

💡 With…

deepseek_ai's tweet image. 🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training &amp; inference!

Core components of NSA:
• Dynamic hierarchical sparse strategy
• Coarse-grained token compression
• Fine-grained token selection

💡 With…

deepseek_ai's tweet image. 🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training &amp; inference!

Core components of NSA:
• Dynamic hierarchical sparse strategy
• Coarse-grained token compression
• Fine-grained token selection

💡 With…

deepseek_ai's tweet image. 🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training &amp; inference!

Core components of NSA:
• Dynamic hierarchical sparse strategy
• Coarse-grained token compression
• Fine-grained token selection

💡 With…

Shashank Rajput reposted

Mahesh Sathiamoorthy

Jan 28

We are announcing Open Thoughts, our large-scale open-source effort to curate the best open reasoning datasets! DeepSeek-R1 is amazing but we still don't have access to high-quality open reasoning datasets. These datasets are crucial if you want to build your reasoning models!…

madiator's tweet image. We are announcing Open Thoughts, our large-scale open-source effort to curate the best open reasoning datasets!

DeepSeek-R1 is amazing but we still don't have access to high-quality open reasoning datasets. These datasets are crucial if you want to build your reasoning models!…

Shashank Rajput reposted

Mahesh Sathiamoorthy

Jan 22

Introducing Bespoke-Stratos-32B, our reasoning model distilled from DeepSeek-R1 using Berkeley NovaSky’s Sky-T1 recipe. The model outperforms Sky-T1 and o1-preview in reasoning (Math and Code) benchmarks and almost reaches the performance of DeepSeek-R1-Distill-Qwen-32B while…

madiator's tweet image. Introducing Bespoke-Stratos-32B, our reasoning model distilled from DeepSeek-R1 using Berkeley NovaSky’s Sky-T1 recipe.

The model outperforms Sky-T1 and o1-preview in reasoning (Math and Code) benchmarks and almost reaches the performance of DeepSeek-R1-Distill-Qwen-32B while…

Shashank Rajput reposted

Mahesh Sathiamoorthy

Jan 15

We are happy to announce Curator, an open-source library designed to streamline synthetic data generation! High-quality synthetic data generation is essential in training and evaluating LLMs/agents/RAG pipelines these days, but tooling around this is still entirely lacking! So…

Shashank Rajput reposted

Mahesh Sathiamoorthy

Jan 13

Nice to see my previous work that I led at Google DeepMind covered by VentureBeat (in the light of a new work from Meta). Context: We had introduced the novel idea of Generative Retrieval for recommender systems to the world in our Neurips 2023 paper called TIGER (Transformer…

madiator's tweet image. Nice to see my previous work that I led at Google DeepMind covered by VentureBeat (in the light of a new work from Meta).

Context: We had introduced the novel idea of Generative Retrieval for recommender systems to the world in our Neurips 2023 paper called TIGER (Transformer…

Shashank Rajput reposted

Kangwook Lee

Jan 7

It's finally here! Excited to share the project I led with KRAFTON and NVIDIA. The future of gaming is here 🙌

NVIDIA GeForce

Jan 7

Transform solo gameplay into a seamless team experience with PUBG Ally. KRAFTON & NVIDIA have teamed up to create the world’s first Co-Playable Character (CPC), built with NVIDIA ACE → nvda.ws/3W4kzhJ

Shashank Rajput reposted

Databricks

Jan 6

Watch the full conversation: youtu.be/2tlWPgmiX2s?si…

databricks's tweet card. Mixed Attention & LLM Context | Data Brew | Episode 35

youtube.com

YouTube

Mixed Attention & LLM Context | Data Brew | Episode 35

Source: youtube.com

Shashank Rajput reposted

Databricks

Jan 6

Databricks research scientist @shashank_r12 s shares approaches in LLMs: - How RAG enhances accuracy - Evolution of attention mechanisms - Practical applications & trade-offs of Mamba architectures

Shashank Rajput reposted

Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)

Dec 29

Soo disappointed that it's just a "department" and not a School, College or an Institute.. gotta get ahead of the curve, @IITKgp!!

rao2z's tweet image. Soo disappointed that it's just a "department" and not a School, College or an Institute.. gotta get ahead of the curve, @IITKgp!!

Shashank Rajput reposted

Hongyi Wang

Dec 16

I have three Ph.D. student openings in my research group at @RutgersCS starting in Fall 2025. If you are interested in working with me on efficient algorithms and systems for LLMs, foundation models, and AI4Science, please apply at: grad.rutgers.edu/academics/prog… The deadline is…

Shashank Rajput reposted

Pallavi

Dec 19

🧵 Super proud to finally share this work I led last quarter - the @Databricks Domain Intelligence Benchmark Suite (DIBS)! TL;DR: Academic benchmarks ≠ real performance and domain intelligence > general capabilities for enterprise tasks. 1/3

herengoneagn's tweet image. 🧵 Super proud to finally share this work I led last quarter - the @Databricks Domain Intelligence Benchmark Suite (DIBS)! TL;DR: Academic benchmarks ≠ real performance and domain intelligence &gt; general capabilities for enterprise tasks. 1/3

Shashank Rajput reposted

dr. jack morris

Dec 14

i'm somewhat confident that both the following properties will hold of language models in 2027: 1. tokenization will be gone, replaced with byte-level ingestion 2. all tokens that don't need to be read or written by a human will be continuous vectors luckily two interesting…

jxmnop's tweet image. i'm somewhat confident that both the following properties will hold of language models in 2027:

1. tokenization will be gone, replaced with byte-level ingestion
2. all tokens that don't need to be read or written by a human will be continuous vectors

luckily two interesting…

Shashank Rajput reposted

Rajko Radovanović

Dec 9

At NeurIPS early? Like making GPUs go brrr? Join me at a luncheon tomorrow on LLM Scaling x Efficiency, 5 mins from the conference center... Note, folks need to have directly relevant work of not in the field. DM me for more info or for reccs! Per the usual, I'll be doing 3…

rajko_rad's tweet image. At NeurIPS early? Like making GPUs go brrr?

Join me at a luncheon tomorrow on LLM Scaling x Efficiency, 5 mins from the conference center...

Note, folks need to have directly relevant work of not in the field. DM me for more info or for reccs!

Per the usual, I'll be doing 3…

Shashank Rajput

Dec 9

I'll be at NeurIPS and would love to chat about anything AI. Also, visit the Databricks booth to checkout out some of the work we've been doing! databricks.com/blog/databrick…

shashank_r12's tweet card. Databricks is a platinum sponsor of NeurIPS 2024, held Dec 10-15 in Vancouver. Visit booth #591 (Dec 10-12) for demos on observability and GenAI tools like MLflow and Mosaic AI. Talks include Matei...

Databricks at NeurIPS 2024 | Databricks Blog

Source: databricks.com

Shashank Rajput reposted

Ahmad Al-Dahle

@Ahmad_Al_Dahle

Dec 6

Introducing Llama 3.3 – a new 70B model that delivers the performance of our 405B model but is easier & more cost-efficient to run. By leveraging the latest advancements in post-training techniques including online preference optimization, this model improves core performance at…

Ahmad_Al_Dahle's tweet image. Introducing Llama 3.3 – a new 70B model that delivers the performance of our 405B model but is easier &amp; more cost-efficient to run. By leveraging the latest advancements in post-training techniques including online preference optimization, this model improves core performance at…

Shashank Rajput reposted

NVIDIA AI Developer

Nov 22

🤔 How can we achieve GPT-3 175B-level performance with only 1.3B parameters? 🌟 New from #NVIDIAResearch: HYMBA (HYbrid Multi-head Bi-Attention) combines MLP and attention mechanisms to dramatically boost small language model capabilities. HYMBA could revolutionize NLP…

NVIDIAAIDev's tweet image. 🤔 How can we achieve GPT-3 175B-level performance with only 1.3B parameters? 🌟 New from #NVIDIAResearch: HYMBA (HYbrid Multi-head Bi-Attention) combines MLP and attention mechanisms to dramatically boost small language model capabilities.

HYMBA could revolutionize NLP…

som378to

@som378kk

MD KAMRAN ASHRAF

@ashrafmdkamran

Jhon Harold Pineda D

@jhonpineda97

zirp_gatsby

@zirp_gatsby

Armin

@ArminPCM

Chand

@Chand7449215950

Sandeep Yadav 👨‍💻

@sandindeep

s kk

@skk347588015180

Son

@dualboot404

OptimusBuilder

@YoungJury1

Andrew David Meier

@andrewdmeier

Deepika

@ddssppss

Hyeong-Kyu Froilan Choi

@HyeonggyuC

Devvrit

@Devvrit_Khatri

Michael Carbin

@mcarbin

Margaux

@Duipah2359368

Catrina

@Lvu0oAU67vfAgmK

Elsie

@Deeke81534

Lespier Josh

@lespier94239

Omar 若い王

@Reido2012

RockyParadox

@RockyParadox44

𝗦𝗨𝗡𝗡𝗬 𝗥𝗔𝗝𝗣𝗨𝗧

@SUNNYR4JPUT

Robert Scoble

@Scobleizer

Jennifer

@JennVerdin

Shuxun Wang

@Saberlve

Jasmine Lesner

@JasmineLesner

Starc

@Starc_Institute

Weiyang Liu

@Besteuler

FENG Yang

@fy2598099

$ikunalsingh7's profile picture. AI Research @fractalAIR Prev: GSoC @CERN, Alumnus @IITKgp, Intern @AmiiThinks Reasoning@LLM/VLMs$

Kunal Singh

@ikunalsingh7

Ari Morcos

@arimorcos

BM building AI

@BMAIengineer

Duy Nguyen

@duynwin

Dr. Rohit

@rajput_7_rohit_

Jaydev Tonde

@JaydevTonde

anikzai

@sasuraimonx

Ativ Joshi

@ativsc

utkarsh

@utkarsh_2105

Liya_Fuad

@Liya_Haiqal

Michael Dempsey

@mhdempsey

We work the talk

@seper_sepuluh

Prabhjot Singh

@prabhjotsdhaura

Steadyman

@realPhumlani

Hari Narayan N U

@hari_nu101

Arjun Talati

@arjuntalati

Manoj Acharya

@manoja328

Navin

@navdev0

SamEgwuJr

@SamEgwuJr

Krishna Kaasyap

@krishnakaasyap

Smoda

@Smoda9FNQ

Bhavya Goyal

@imbhavyagoyal

MD KAMRAN ASHRAF

@ashrafmdkamran

GPU MODE

@GPU_MODE

Danijar Hafner

@danijarh

Applied Compute

@appliedcompute

Elana Simon

@ElanaPearl

Daniel Machlab

@Daniel_Machlab

Simu Liu

@SimuLiu

Rishabh Agarwal

@agarwl_

Yuandong Tian

@tydsh

Hyeong-Kyu Froilan Choi

@HyeonggyuC

Devvrit

@Devvrit_Khatri

Reflection AI

@reflection_ai

PyTorch

@PyTorch

praj

@prajjwal_1

Omar 若い王

@Reido2012

Perceptron AI

@perceptroninc

Scott Gray

@scottgray76

Grad

@Grad62304977

Bram Wasti

@bwasti

Deedy

@deedydas

signüll

@signulll

Weiyang Liu

@Besteuler

will brown

@willccbb

$ikunalsingh7's profile picture. AI Research @fractalAIR Prev: GSoC @CERN, Alumnus @IITKgp, Intern @AmiiThinks Reasoning@LLM/VLMs$

Kunal Singh

@ikunalsingh7

David Brandfonbrener

@brandfonbrener

jianlin.su

@Jianlin_S

Anish Athalye

@anishathalye

Sumanth

@sumanthd17

Prem Qu Nair

@premqnair

bagels.ai

@bagelsAI

SemiAnalysis

@SemiAnalysis_

Omar Sanseviero

@osanseviero

Alexandr Wang

@alexandr_wang

utkarsh

@utkarsh_2105

Ativ Joshi

@ativsc

kalomaze

@kalomaze

MiniMax (official)

@MiniMax__AI

Songlin Yang

@SonglinYang4

vLLM

@vllm_project

Alan Ritter

@alan_ritter

Manoj Acharya

@manoja328

Zhuang Liu

@liuzhuang1234

typedfemale

@typedfemale

Ioannis Antonoglou

@real_ioannis

Misha Laskin

@MishaLaskin

Sholto Douglas

@_sholtodouglas

Hanxiao Liu

@Hanxiao_6

kipply

@kipperrii

Hao Zhang

@haozhangml

United States Trends

1. Blue Origin 6,153 posts
2. Megyn Kelly 28.1K posts
3. Vine 31.8K posts
4. New Glenn 7,300 posts
5. Senator Fetterman 17.2K posts
6. CarPlay 4,228 posts
7. #NXXT_JPMorgan N/A
8. World Cup 96K posts
9. Portugal 57.9K posts
10. Padres 29.1K posts
11. Brainiac 3,388 posts
12. Matt Gaetz 12.3K posts
13. Black Mirror 5,086 posts
14. GeForce Season N/A
15. Cynthia 110K posts
16. Eric Swalwell 23.1K posts
17. Osimhen 97.4K posts
18. Katie Couric 9,209 posts
19. #WorldKindnessDay 16.6K posts
20. V-fib N/A

You might like

Something went wrong.

Something went wrong.