David Elson

@davidelson

AGI Alignment/Safety @ Google DeepMind. Opinions my own

Joined September 2008

13Posts 74Followers 82Following

You might like

@andrewsilva9

@lessteza

@Sylvia_Sparkle

@zachthinks

@miskinmd

David Elson

Nov 19

Gemini 3.0 Frontier Safety report ⬇️

Victoria Krakovna

Nov 18

Frontier Safety Framework report for Gemini 3 Pro, presenting risk assessments and evaluation results in CBRN, Cybersecurity, Harmful Manipulation, Machine Learning R&D and Misalignment domains. storage.googleapis.com/deepmind-media…

David Elson

Nov 1

New paper, following up on our chain-of-thought faithfulness work from a few months ago, about how we can make sure that LLM thoughts are staying faithful and monitorable.

Scott Emmons

Oct 31

CoT monitoring is one of our best shots at AI safety. But it's fragile and could be lost due to RL or architecture changes. Would we even notice if it starts slipping away? 🧵

emmons_scott's tweet image. CoT monitoring is one of our best shots at AI safety. But it's fragile and could be lost due to RL or architecture changes.

Would we even notice if it starts slipping away? 🧵

David Elson

Jul 10

New paper showing that when LLMs chew over tough problems, they tend to think clearly and transparently -- making them easier to monitor for bad behavior ⬇️

Scott Emmons

Jul 9

Is CoT monitoring a lost cause due to unfaithfulness? 🤔 We say no. The key is the complexity of the bad behavior. When we replicate prior unfaithfulness work but increase complexity—unfaithfulness vanishes! Our finding: "When Chain of Thought is Necessary, Language Models…

emmons_scott's tweet image. Is CoT monitoring a lost cause due to unfaithfulness? 🤔

We say no. The key is the complexity of the bad behavior. When we replicate prior unfaithfulness work but increase complexity—unfaithfulness vanishes!

Our finding: "When Chain of Thought is Necessary, Language Models…

David Elson reposted

Zac Kenton

Feb 10

We're hiring for our Google DeepMind AGI Safety & Alignment and Gemini Safety teams. Locations: London, NYC, Mountain View, SF. Join us to help build safe AGI. Research Engineer boards.greenhouse.io/deepmind/jobs/…… Research Scientist boards.greenhouse.io/deepmind/jobs/…

David Elson

Jan 23

Some promising results on keeping AIs from scheming against you - or at least removing the incentive for them to do this.

David Lindner

Jan 23

New Google DeepMind safety paper! LLM agents are coming – how do we stop them finding complex plans to hack the reward? Our method, MONA, prevents many such hacks, *even if* humans are unable to detect them! Inspired by myopic optimization but better performance – details in🧵

davlindner's tweet image. New Google DeepMind safety paper! LLM agents are coming – how do we stop them finding complex plans to hack the reward?

Our method, MONA, prevents many such hacks, *even if* humans are unable to detect them!

Inspired by myopic optimization but better performance – details in🧵

Noralnar

@Noralnar326

sid

@sidgreddy

Carolina

@zgoncbanti54101

Antonia

@alicksorli9852

Kirill

@Reason239

Abhijit AI

@Abhijit690

skfinport 🇺🇸

@skfinport

Arsenio Bellingham

@l2_norm

Roli Bosch

@rolibosch

agusti

@bleuonbase

Advait

@advtydv

Sam Kuhn

@SamKuhnDev

Ari

@Ari_S_123

Joe

@joemkwon

SGY.

@SGY2157

Jorge Bravo Abad

@bravo_abad

William Wale

@williawa

Rational Animations

@RationalAnimat1

Prahitha Movva @NeurIPS2025

@PrahithaM

Riley Goodside

@goodside

Alex Turner

@Turn_Trout

Ucirakork

@Ucirakork0634

Manuela

@uD6l565r2L8OS

Tedoliath

@Tedoliath

Ray Bell -DoIT-

@BellDoit

shakal

@shakalkos

geoHeil

@geoHeil

saso badovinac

@sasobadovinac

SouNd

@SouNdmys

Aymeric Lacroix 💎

@TwitAymeric

Samuel Opatowski

@Opatowski01

Alejandro Zenker

@alejandrozenker

itlay fang

@FangItlay80016

Simple Magic

@Simplemagic_ai

𝐂𝐨𝐥𝐞 🦔

@cole_barcia

RootFTW

@RootFTW

Murakashi

@MasasiMurakashi

Othman

@OthHak

Saurabh Chopra

@saurabhchopraa

superslooth

@superslooth2

Andreas Kirsch 🇺🇦

@BlackHC

Rota

@pli_cachete

Saber Darabi

@SADarabi

Samuel Albanie 🇬🇧

@SamuelAlbanie

Eakirxe

@Eakirxe7837435

Erik Jenner

@jenner_erik

Raymond Ng

@Raymondng_aisg

Derek Parfait

@haus_cole

Shannon Yang

@shannonyangsky

JAEHYEONG_KIM

@jhk40160806

Adam Gleave

@ARGleave

Ryan Greenblatt

@RyanPGreenblatt

anshuman

@athleticKoder

Yaakov Katz

@yaakovkatz

Tom Everitt

@tom4everitt

Dr. Eli David

@DrEliDavid

Jack Lindsey

@Jack_W_Lindsey

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭

@elder_plinius

Erik Jenner

@jenner_erik

Scott Emmons

@emmons_scott

Haviv Rettig Gur

@havivrettiggur

Columbia Jewish & Israeli Students ✡️🇮🇱

@CUJewsIsraelis

Jan Leike

@janleike

Columbia Jewish Alumni Assoc.

@CU_JewishAlumni

Steve McGuire

@sfmcguire79

Zvi Mowshowitz

@TheZvi

Allan Dafoe

@AllanDafoe

Gavin Baker

@GavinSBaker

Agus 🔎🔸

@austinc3301

Anca Dragan

@ancadianadragan

Caleb Biddulph

@CalebBiddulph

Vikrant Varma

@VikrantVarma_

Sebastian Farquhar

@seb_far

David Lindner

@davlindner

Rohin Shah

@rohinmshah

Alex Turner

@Turn_Trout

AI Notkilleveryoneism Memes ⏸️

@AISafetyMemes

Chris Anderson

@chr1sa

Simon Willison

@simonw

Logan Kilpatrick

@OfficialLoganK

Sriram Krishnan

@sriramk

Christopher Ahlberg

@cahlberg

Starcloud

@Starcloud_Inc_

Department of Government Efficiency

@DOGE

Lewis Tunstall

@_lewtun

Jay Nagy

@JayNagy

FutureAzA

@FutureAZA

Ashok Elluswamy

@aelluswamy

Marc Andreessen 🇺🇸

@pmarca

Noah Smith 🐇🇺🇸🇺🇦🇹🇼

@Noahpinion

Demis Hassabis

@demishassabis

Jeffrey Ladish

@JeffLadish

Peter D Carter

@PCarterClimate

Ted Werbel

@tedx_ai

James Edward Hansen

@DrJamesEHansen

Dan Miller

@danmiller999

Philipp Schmid

@_philschmid

Sam D'Amico

@sdamico

Dan Hendrycks

@hendrycks

Leon Simons 🌍

@LeonSimons8

United States Trends

1. #GMMTV2026 3.75M posts
2. Good Tuesday 34.8K posts
3. MILKLOVE BORN TO SHINE 557K posts
4. #NuestraBanderaEsBolívar 2,184 posts
5. #tuesdayvibe 2,644 posts
6. Taco Tuesday 12.2K posts
7. Alan Dershowitz 4,019 posts
8. WILLIAMEST MAGIC VIBES 92.1K posts
9. Mark Kelly 229K posts
10. Happy Thanksgiving 17.9K posts
11. University of Minnesota N/A
12. Enron 1,863 posts
13. Praying for Pedro N/A
14. #25Nov 2,493 posts
15. Mainz Biomed N.V. N/A
16. Hegseth 107K posts
17. Maddow 17.7K posts
18. JOSSGAWIN MAGIC VIBES 34.1K posts
19. #DittoSeries 104K posts
20. Naps 2,943 posts

You might like

Something went wrong.

Something went wrong.