Cameron R. Wolfe, Ph.D.
@cwolferesearch
Research @Netflix • Writer @ Deep (Learning) Focus • PhD @optimalab1 • I make AI understandable
You might like
Reinforcement Learning (RL) is quickly becoming the most important skill for AI researchers. Here are the best resources for learning RL for LLMs… TL;DR: RL is more important now than it has ever been, but (probably due to its complexity) there aren’t a ton of great resources…
The original PPO-based RLHF pipeline had 4 model copies: 1. Policy 2. Reference 3. Critic 4. Reward Model Recent GRPO-based RLVR pipelines have eliminated all of these models except for the policy. - The critic is no longer needed because values are estimated from group…
Interesting note from Olmo-3 that KL divergence is excluded from GRPO loss. This is becoming a standard choice for reasoning / RL training pipelines, and it doesn't seem to cause training instability. Yet another reminder that RL for LLMs very different than traditional DeepRL.
The Olmo technical reports / artifacts are by far the most useful resource for those working on LLMs outside of closed frontier labs. You can read the papers, read the code, look at the data, and even train the models yourself. No other resource provides this level of detail, and…
We present Olmo 3, our next family of fully open, leading language models. This family of 7B and 32B models represents: 1. The best 32B base model. 2. The best 7B Western thinking & instruct models. 3. The first 32B (or larger) fully open reasoning model. This is a big…
Generalized Advantage Estimation (GAE)–used in PPO–is one of the most complicated aspects of reinforcement learning (RL). Here’s how it works and how we can implement it… The advantage tells us how much better a given action is compared to the average action in a given state:…
This is (in my opinion) one of the top-3 most useful books to be written on LLMs. I highly recommend reading / buying it. I've personally read it >10 times since Nathan started writing it.
I'm excited to announce my RLHF Book is now in pre-order for the Manning Early Access Program (MEAP), @ManningBooks, and for this milestone it's 50% off. Excited to land in print in early 2026! Lots of improvements coming soon. Link below & thanks for the support!
The next AI Agents in Production conference is on November 18th. For those interested in the practical side of LLMs / agents, this is a good event to attend. Some highlights: - Completely free. - Everything can be viewed online. - Good talks from top companies (OAI, GDM, Meta,…
Couldn't be more excited for better interaction between X / substack. Check out my newsletter here: cameronrwolfe.substack.com
Update 2: even correcting for the fake views, traffic to Substack links from X is up substantially. (full post reads, signups, etc. also track.) We're so back!
The memory folding mechanism proposed in this paper is great. It makes sense that agents should spend time explicitly compressing their memory into a semantic / organized format to avoid context explosion. Worth mentioning though that memory compression / retention in agents…
assistive coding tools definitely make me more productive, but the pattern isn't uniform. biggest productivity boost comes later in the day / at night when I'm mentally exhausted. LLMs lower the barrier to entry for getting extra work done. validating or iterating on code with an…
The value of RL is very clearly / nicely articulated by DeepSeekMath… - RL enhances maj@k (majority vote), but not pass@k. - RL boosts the probability of correct completions that are already in top-k. - RL does NOT clearly enhance model capabilities.
I can't believe I'm saying this - I'm officially a published author :D After three years, my first book is out. "AI for the Rest of Us" with @BloomsburyAcad is finally in the world. I wrote it because I watched too many people get left behind in AI conversations. The gap…
"Through clever storytelling and illustration, [Sundaresan] brings technical concepts to life[.]" — Dr. Cameron R. Wolfe, Senior Research Scientist at Netflix (@cwolferesearch) Learn more: bit.ly/42ZCs4z @DSaience
United States Trends
- 1. Thanksgiving 2.06M posts
- 2. Jack White 5,992 posts
- 3. Packers 40.2K posts
- 4. Dan Campbell 2,504 posts
- 5. #GoPackGo 6,492 posts
- 6. Jordan Love 7,123 posts
- 7. Goff 6,932 posts
- 8. Watson 12.3K posts
- 9. #GBvsDET 3,300 posts
- 10. Thankful 412K posts
- 11. #OnePride 5,873 posts
- 12. Wicks 4,259 posts
- 13. Gibbs 7,293 posts
- 14. Turkey 265K posts
- 15. Jameson Williams 1,754 posts
- 16. Green Bay 6,171 posts
- 17. Tom Kennedy 1,062 posts
- 18. Jamo 3,352 posts
- 19. Amon Ra 2,626 posts
- 20. Seven Nation Army N/A
You might like
-
Google DeepMind
@GoogleDeepMind -
Hugging Face
@huggingface -
clem 🤗
@ClementDelangue -
Jim Fan
@DrJimFan -
François Chollet
@fchollet -
Jan Leike
@janleike -
LangChain
@LangChainAI -
hardmaru
@hardmaru -
Wojciech Zaremba
@woj_zaremba -
Aran Komatsuzaki
@arankomatsuzaki -
Harrison Chase
@hwchase17 -
Percy Liang
@percyliang -
elvis
@omarsar0 -
Jerry Liu
@jerryjliu0 -
Yi Tay
@YiTayML
Something went wrong.
Something went wrong.