Cameron R. Wolfe, Ph.D.

@cwolferesearch

Research @Netflix • Writer @ Deep (Learning) Focus • PhD @optimalab1 • I make AI understandable

Science & Technology

cameronrwolfe.me

Joined August 2021

4KPosts 28KFollowers 680Following

You might like

@GoogleDeepMind

@huggingface

@ClementDelangue

@DrJimFan

@fchollet

@janleike

@LangChainAI

@hardmaru

@woj_zaremba

@arankomatsuzaki

@hwchase17

@percyliang

@omarsar0

@jerryjliu0

@YiTayML

Pinned

Cameron R. Wolfe, Ph.D.

@cwolferesearch

Apr 16

Reinforcement Learning (RL) is quickly becoming the most important skill for AI researchers. Here are the best resources for learning RL for LLMs… TL;DR: RL is more important now than it has ever been, but (probably due to its complexity) there aren’t a ton of great resources…

cwolferesearch's tweet image. Reinforcement Learning (RL) is quickly becoming the most important skill for AI researchers. Here are the best resources for learning RL for LLMs…

TL;DR: RL is more important now than it has ever been, but (probably due to its complexity) there aren’t a ton of great resources…

Cameron R. Wolfe, Ph.D.

@cwolferesearch

Nov 25

The original PPO-based RLHF pipeline had 4 model copies: 1. Policy 2. Reference 3. Critic 4. Reward Model Recent GRPO-based RLVR pipelines have eliminated all of these models except for the policy. - The critic is no longer needed because values are estimated from group…

cwolferesearch's tweet image. The original PPO-based RLHF pipeline had 4 model copies:

1. Policy
2. Reference
3. Critic
4. Reward Model

Recent GRPO-based RLVR pipelines have eliminated all of these models except for the policy.

- The critic is no longer needed because values are estimated from group…

Cameron R. Wolfe, Ph.D.

@cwolferesearch

Nov 24

Interesting note from Olmo-3 that KL divergence is excluded from GRPO loss. This is becoming a standard choice for reasoning / RL training pipelines, and it doesn't seem to cause training instability. Yet another reminder that RL for LLMs very different than traditional DeepRL.

cwolferesearch's tweet image. Interesting note from Olmo-3 that KL divergence is excluded from GRPO loss. This is becoming a standard choice for reasoning / RL training pipelines, and it doesn't seem to cause training instability. Yet another reminder that RL for LLMs very different than traditional DeepRL.

Cameron R. Wolfe, Ph.D.

@cwolferesearch

Nov 20

The Olmo technical reports / artifacts are by far the most useful resource for those working on LLMs outside of closed frontier labs. You can read the papers, read the code, look at the data, and even train the models yourself. No other resource provides this level of detail, and…

Nathan Lambert

@natolambert

Nov 20

We present Olmo 3, our next family of fully open, leading language models. This family of 7B and 32B models represents: 1. The best 32B base model. 2. The best 7B Western thinking & instruct models. 3. The first 32B (or larger) fully open reasoning model. This is a big…

natolambert's tweet image. We present Olmo 3, our next family of fully open, leading language models.
This family of 7B and 32B models represents:

1. The best 32B base model.
2. The best 7B Western thinking &amp; instruct models.
3. The first 32B (or larger) fully open reasoning model.

This is a big…

Cameron R. Wolfe, Ph.D.

@cwolferesearch

Nov 18

Generalized Advantage Estimation (GAE)–used in PPO–is one of the most complicated aspects of reinforcement learning (RL). Here’s how it works and how we can implement it… The advantage tells us how much better a given action is compared to the average action in a given state:…

cwolferesearch's tweet image. Generalized Advantage Estimation (GAE)–used in PPO–is one of the most complicated aspects of reinforcement learning (RL). Here’s how it works and how we can implement it…

The advantage tells us how much better a given action is compared to the average action in a given state:…

Cameron R. Wolfe, Ph.D.

@cwolferesearch

Nov 14

This is (in my opinion) one of the top-3 most useful books to be written on LLMs. I highly recommend reading / buying it. I've personally read it >10 times since Nathan started writing it.

Nathan Lambert

@natolambert

Nov 14

I'm excited to announce my RLHF Book is now in pre-order for the Manning Early Access Program (MEAP), @ManningBooks, and for this milestone it's 50% off. Excited to land in print in early 2026! Lots of improvements coming soon. Link below & thanks for the support!

natolambert's tweet image. I'm excited to announce my RLHF Book is now in pre-order for the Manning Early Access Program (MEAP), @ManningBooks, and for this milestone it's 50% off.

Excited to land in print in early 2026! Lots of improvements coming soon.

Link below &amp; thanks for the support!

Cameron R. Wolfe, Ph.D.

@cwolferesearch

Nov 7

The next AI Agents in Production conference is on November 18th. For those interested in the practical side of LLMs / agents, this is a good event to attend. Some highlights: - Completely free. - Everything can be viewed online. - Good talks from top companies (OAI, GDM, Meta,…

cwolferesearch's tweet image. The next AI Agents in Production conference is on November 18th. For those interested in the practical side of LLMs / agents, this is a good event to attend. Some highlights:

- Completely free.
- Everything can be viewed online.
- Good talks from top companies (OAI, GDM, Meta,…

Cameron R. Wolfe, Ph.D.

@cwolferesearch

Nov 5

Couldn't be more excited for better interaction between X / substack. Check out my newsletter here: cameronrwolfe.substack.com

Chris Best

@chrisbest

Nov 4

Update 2: even correcting for the fake views, traffic to Substack links from X is up substantially. (full post reads, signups, etc. also track.) We're so back!

Cameron R. Wolfe, Ph.D.

@cwolferesearch

Nov 4

The memory folding mechanism proposed in this paper is great. It makes sense that agents should spend time explicitly compressing their memory into a semantic / organized format to avoid context explosion. Worth mentioning though that memory compression / retention in agents…

cwolferesearch's tweet image. The memory folding mechanism proposed in this paper is great. It makes sense that agents should spend time explicitly compressing their memory into a semantic / organized format to avoid context explosion.

Worth mentioning though that memory compression / retention in agents…

Cameron R. Wolfe, Ph.D.

@cwolferesearch

Nov 4

assistive coding tools definitely make me more productive, but the pattern isn't uniform. biggest productivity boost comes later in the day / at night when I'm mentally exhausted. LLMs lower the barrier to entry for getting extra work done. validating or iterating on code with an…

Cameron R. Wolfe, Ph.D.

@cwolferesearch

Nov 1

The value of RL is very clearly / nicely articulated by DeepSeekMath… - RL enhances maj@k (majority vote), but not pass@k. - RL boosts the probability of correct completions that are already in top-k. - RL does NOT clearly enhance model capabilities.

cwolferesearch's tweet image. The value of RL is very clearly / nicely articulated by DeepSeekMath…

- RL enhances maj@k (majority vote), but not pass@k.
- RL boosts the probability of correct completions that are already in top-k.
- RL does NOT clearly enhance model capabilities.

Cameron R. Wolfe, Ph.D. reposted

Sairam Sundaresan

@DSaience

Oct 30

I can't believe I'm saying this - I'm officially a published author :D After three years, my first book is out. "AI for the Rest of Us" with @BloomsburyAcad is finally in the world. I wrote it because I watched too many people get left behind in AI conversations. The gap…

DSaience's tweet image. I can't believe I'm saying this - I'm officially a published author :D

After three years, my first book is out.

"AI for the Rest of Us" with @BloomsburyAcad is finally in the world.

I wrote it because I watched too many people get left behind in AI conversations.

The gap…

Cameron R. Wolfe, Ph.D. reposted

Bloomsbury Academic

@BloomsburyAcad

Oct 30

"Through clever storytelling and illustration, [Sundaresan] brings technical concepts to life[.]" — Dr. Cameron R. Wolfe, Senior Research Scientist at Netflix (@cwolferesearch) Learn more: bit.ly/42ZCs4z @DSaience