David D. Baek
@dbaek__
PhD Student @ MIT EECS / Mechanistic Interpretability, Scalable Oversight
🚨 AI Safety Arms Race: Even after OpenAI’s emergent misalignment patching, we can easily leverage their SFT API to obtain a Turncoat GPT Model (not even adversarial fine-tuning, and can even easily bypass the detection from @johnschulman2’s recent work) that produces even more…
BREAKING: Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well. Here's what Apple discovered: (hint: we're not as close to AGI as the hype suggests)
Today, the most competent AI systems in almost *any* domain (math, coding, etc.) are broadly knowledgeable across almost *every* domain. Does it have to be this way, or can we create truly narrow AI systems? In a new preprint, we explore some questions relevant to this goal...
Interested in the science of language models but tired of neural scaling laws? Here's a new perspective: our new paper presents neural thermodynamic laws -- thermodynamic concepts and laws naturally emerge in language model training! AI is naturAl, not Artificial, after all.
1/14: If sparse autoencoders work, they should give us interpretable classifiers that help with probing in difficult regimes (e.g. data scarcity). But we find that SAE probes consistently underperform! Our takeaway: mech interp should use stronger baselines to measure progress 🧵
(1/N) LLMs represent numbers on a helix? And use trigonometry to do addition? Answers below 🧵
United States Trends
- 1. GTA 6 62.5K posts
- 2. GTA VI 22.1K posts
- 3. Rockstar 52.9K posts
- 4. Antonio Brown 6,357 posts
- 5. GTA 5 8,856 posts
- 6. Nancy Pelosi 131K posts
- 7. Ozempic 19.7K posts
- 8. Paul DePodesta 2,245 posts
- 9. Rockies 4,253 posts
- 10. Free AB N/A
- 11. #LOUDERTHANEVER 1,521 posts
- 12. Elon 400K posts
- 13. Grisham 1,884 posts
- 14. Silver Slugger 4,496 posts
- 15. GTA 7 1,393 posts
- 16. Kanye 26.5K posts
- 17. Justin Dean 1,990 posts
- 18. Fickell 1,148 posts
- 19. #TNFonPrime 1,409 posts
- 20. Grand Theft Auto VI 45.6K posts
Something went wrong.
Something went wrong.