Rylan Schaeffer

@RylanSchaeffer

CS PhD Student at Stanford Trustworthy AI Research with @sanmikoyejo. Prev interned/worked @ Meta, Google, MIT, Harvard, Uber, UCL, UC Davis

Science & Technology

Mountain View, CA

rylanschaeffer.github.io

Joined October 2011

1KPosts 6KFollowers 2KFollowing

You might like

@dyamins

@jaschasd

@naturecomputes

@RishiBommasani

@patrickmineault

@woosuk_k

@GatsbyUCL

@s_y_chung

@blake__bordelon

@SuryaGanguli

@jcrwhittington

@dileeplearning

@BrandoHablando

@alfairhall

@ItsNeuronal

Pinned

Rylan Schaeffer

@RylanSchaeffer

Sep 19

"we find that simply training on self-generations with the exact same arch can actually improve performance" Same as what we showed in our model collapse paper arxiv.org/abs/2404.01413 ! Synthetic data done correctly can be a fantastic resource!

RylanSchaeffer's tweet image. "we find that simply training on self-generations with the exact same arch can actually improve performance"

Same as what we showed in our model collapse paper arxiv.org/abs/2404.01413 !

Synthetic data done correctly can be a fantastic resource!

Suhas Kotha

@kothasuhas

Sep 19

Are ♾ parameters necessary for data efficiency wins? Via distillation, we compress an 8-ensemble into a single model and retain most of the improvement. Furthermore, we find that simply training on self-generations with the exact same arch can actually improve performance

kothasuhas's tweet image. Are ♾ parameters necessary for data efficiency wins? Via distillation, we compress an 8-ensemble into a single model and retain most of the improvement. Furthermore, we find that simply training on self-generations with the exact same arch can actually improve performance

Rylan Schaeffer reposted

Radical Numerics

@RadicalNumerics

Oct 9

Introducing RND1, the most powerful base diffusion language model (DLM) to date. RND1 (Radical Numerics Diffusion) is an experimental DLM with 30B params (3B active) with a sparse MoE architecture. We are making it open source, releasing weights, training details, and code to…

Rylan Schaeffer reposted

John Nguyen

@__JohnNguyen__

Oct 7

Transfusion combines autoregressive with diffusion to train a single transformer, but what if we combine Flow with Flow? 🤔 🌊OneFlow🌊 the first non-autoregressive model to generate text and images concurrently using a single transformer—unifying Edit Flow (text) with Flow…

Rylan Schaeffer reposted

Ethan Perez

@EthanJPerez

Sep 4

We’re hiring someone to run the Anthropic Fellows Program! Our research collaborations have led to some of our best safety research and hires. We’re looking for an exceptional ops generalist, TPM, or research/eng manager to help us significantly scale and improve our collabs 🧵

Rylan Schaeffer

@RylanSchaeffer

Aug 8

I've tried using GPT 5, GPT 5 Thinking and GPT 5 Pro for AI/ML research I'm underwhelmed :( I can't tell whether the problem is my expectations are higher, but it feels like a non-improvement over previous OpenAI models.

Rylan Schaeffer reposted

Jonathan Lee

@jon_lee0

Aug 1

The voxel pagodas have a special place in my heart 🌸 It's been my go-to vibe eval since early Deep Think development. Glad it made it into the demo!

ʟᴇɢɪᴛ

@legit_api

Aug 1

wow - Gemini 2.5 Deep Think

Rylan Schaeffer

@RylanSchaeffer

Jul 25

The hardest scaling law prediction problems are no match for @YingXiao armed only with a spreadsheet and his measuring stick

Rylan Schaeffer reposted

koray kavukcuoglu

@koraykv

Jul 21

Advanced version of Gemini Deep Think (announced at #GoogleIO) using parallel inference time computation achieved gold-medal performance at IMO, solving 5/6 problems with rigorous proofs as verified by official IMO judges! Congrats to all involved! deepmind.google/discover/blog/…

koraykv's tweet card. Our advanced model officially achieved a gold-medal level performance on problems from the International Mathematical Olympiad (IMO), the world’s most prestigious competition for young...

Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the Interna...

Source: deepmind.google

Rylan Schaeffer reposted

Brando Miranda

@BrandoHablando

Jul 19

Come to Convention Center West room 208-209 2nd floor to learn about optimal data selection using compression like gzip! tldr; you can learn much faster if you use gzip compression distances to select data given a task! DM if you are interested or what to use the code!

Elyas Obbad

@ObbadElyas

Oct 25

🚨 What’s the best way to select data for fine-tuning LLMs effectively? 📢Introducing ZIP-FIT—a compression-based data selection framework that outperforms leading baselines, achieving up to 85% faster convergence in cross-entropy loss, and selects data up to 65% faster. 🧵1/8

ObbadElyas's tweet image. 🚨 What’s the best way to select data for fine-tuning LLMs effectively?

📢Introducing ZIP-FIT—a compression-based data selection framework that outperforms leading baselines, achieving up to 85% faster convergence in cross-entropy loss, and selects data up to 65% faster.

🧵1/8

Rylan Schaeffer reposted

Samaneh Saadat

@smn_sdt

Jul 18

Large Language Monkeys are scaling and they are hungry! 🍌 #ICML2025

Rylan Schaeffer

@RylanSchaeffer

Jul 9

I'll be at @icmlconf #ICML2025 next week to present three papers - reach out if you want to chat about generative AI, scaling laws, synthetic data or any other AI topic! #1 How Do Large Language Monkeys Get Their Power (Laws)? x.com/RylanSchaeffer…

Rylan Schaeffer

@RylanSchaeffer

Jul 18

If you want to learn about the power (laws) of large language monkeys (and get a free banana 🍌), come to our poster at #ICML2025 !!

Rylan Schaeffer

@RylanSchaeffer

Jul 17

One of two claims is true: 1) We fully automated AI/ML research, published one paper and then did nothing else with the technology, or 2) People lie on Twitter Come to our poster E2809 at #ICML2025 now to find out which!!!

Elvis Dohmatob

@dohmatobelvis

Apr 12

We refused to cite the paper due to severe misconduct of the authors of that paper: plagiarism of our own prior work, predominantly AI-generated content (ya, the authors plugged our paper into an LLM and generated another paper), IRB violations, etc. Revealed during a long…

Rylan Schaeffer

@RylanSchaeffer

Jul 16

I'm excited to announce that @pfau is wrong - people do still care about non language modeling ML research! #ICML2025

Rylan Schaeffer

@RylanSchaeffer

Jul 15

Post-AGI employment opportunity: AI Chain of Thought Inspector?

Jan Leike

@janleike

Jul 15

If you don't train your CoTs to look nice, you could get some safety from monitoring them. This seems good to do! But I'm skeptical this will work reliably enough to be load-bearing in a safety case. Plus as RL is scaled up, I expect CoTs to become less and less legible.

Rylan Schaeffer

@RylanSchaeffer

Jul 15

#ICML2025 hot take: A famous researcher (redacted) said they feel like AI safety / existential risk from AI is the most important challenge of our time, and despite many researchers being well intentioned, this person feels like the field has produced no deliverables, has no idea…

Rylan Schaeffer

@RylanSchaeffer

Jul 15

An #ICML2025 story from yesterday: @BrandoHablando and I made a new friend who told us that she saw me give a talk at NeurIPS 2023 and then messaged Brando to chat, without realizing that Brando and I are different people 😂