David Elson
@davidelson
AGI Alignment/Safety @ Google DeepMind. Opinions my own
You might like
Gemini 3.0 Frontier Safety report ⬇️
Frontier Safety Framework report for Gemini 3 Pro, presenting risk assessments and evaluation results in CBRN, Cybersecurity, Harmful Manipulation, Machine Learning R&D and Misalignment domains. storage.googleapis.com/deepmind-media…
New paper, following up on our chain-of-thought faithfulness work from a few months ago, about how we can make sure that LLM thoughts are staying faithful and monitorable.
CoT monitoring is one of our best shots at AI safety. But it's fragile and could be lost due to RL or architecture changes. Would we even notice if it starts slipping away? 🧵
New paper showing that when LLMs chew over tough problems, they tend to think clearly and transparently -- making them easier to monitor for bad behavior ⬇️
Is CoT monitoring a lost cause due to unfaithfulness? 🤔 We say no. The key is the complexity of the bad behavior. When we replicate prior unfaithfulness work but increase complexity—unfaithfulness vanishes! Our finding: "When Chain of Thought is Necessary, Language Models…
We're hiring for our Google DeepMind AGI Safety & Alignment and Gemini Safety teams. Locations: London, NYC, Mountain View, SF. Join us to help build safe AGI. Research Engineer boards.greenhouse.io/deepmind/jobs/…… Research Scientist boards.greenhouse.io/deepmind/jobs/…
Some promising results on keeping AIs from scheming against you - or at least removing the incentive for them to do this.
New Google DeepMind safety paper! LLM agents are coming – how do we stop them finding complex plans to hack the reward? Our method, MONA, prevents many such hacks, *even if* humans are unable to detect them! Inspired by myopic optimization but better performance – details in🧵
United States Trends
- 1. #GMMTV2026 3.75M posts
- 2. Good Tuesday 34.8K posts
- 3. MILKLOVE BORN TO SHINE 557K posts
- 4. #NuestraBanderaEsBolívar 2,184 posts
- 5. #tuesdayvibe 2,644 posts
- 6. Taco Tuesday 12.2K posts
- 7. Alan Dershowitz 4,019 posts
- 8. WILLIAMEST MAGIC VIBES 92.1K posts
- 9. Mark Kelly 229K posts
- 10. Happy Thanksgiving 17.9K posts
- 11. University of Minnesota N/A
- 12. Enron 1,863 posts
- 13. Praying for Pedro N/A
- 14. #25Nov 2,493 posts
- 15. Mainz Biomed N.V. N/A
- 16. Hegseth 107K posts
- 17. Maddow 17.7K posts
- 18. JOSSGAWIN MAGIC VIBES 34.1K posts
- 19. #DittoSeries 104K posts
- 20. Naps 2,943 posts
Something went wrong.
Something went wrong.