David D. Baek

@dbaek__

PhD Student @ MIT EECS / Mechanistic Interpretability, Scalable Oversight

Cambridge, MA

dbaek.org

Joined February 2024

41Posts 2KFollowers 33Following

David D. Baek reposted

Jiawei Zhang

Oct 28

🚨 AI Safety Arms Race: Even after OpenAI’s emergent misalignment patching, we can easily leverage their SFT API to obtain a Turncoat GPT Model (not even adversarial fine-tuning, and can even easily bypass the detection from @johnschulman2’s recent work) that produces even more…

David D. Baek reposted

Ruben Hassid

Jun 7

BREAKING: Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well. Here's what Apple discovered: (hint: we're not as close to AGI as the hype suggests)

RubenHssd's tweet image. BREAKING: Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all.

They just memorize patterns really well.

Here's what Apple discovered:

(hint: we're not as close to AGI as the hype suggests)

David D. Baek reposted

Eric J. Michaud

May 22

Today, the most competent AI systems in almost *any* domain (math, coding, etc.) are broadly knowledgeable across almost *every* domain. Does it have to be this way, or can we create truly narrow AI systems? In a new preprint, we explore some questions relevant to this goal...

David D. Baek reposted

Ziming Liu

May 17

Interested in the science of language models but tired of neural scaling laws? Here's a new perspective: our new paper presents neural thermodynamic laws -- thermodynamic concepts and laws naturally emerge in language model training! AI is naturAl, not Artificial, after all.

ZimingLiu11's tweet image. Interested in the science of language models but tired of neural scaling laws? Here's a new perspective: our new paper presents neural thermodynamic laws -- thermodynamic concepts and laws naturally emerge in language model training!

AI is naturAl, not Artificial, after all.

David D. Baek reposted

Josh Engels

Feb 25

1/14: If sparse autoencoders work, they should give us interpretable classifiers that help with probing in difficult regimes (e.g. data scarcity). But we find that SAE probes consistently underperform! Our takeaway: mech interp should use stronger baselines to measure progress 🧵

JoshAEngels's tweet image. 1/14: If sparse autoencoders work, they should give us interpretable classifiers that help with probing in difficult regimes (e.g. data scarcity). But we find that SAE probes consistently underperform! Our takeaway: mech interp should use stronger baselines to measure progress 🧵

David D. Baek reposted

Subhash Kantamneni

Feb 5

(1/N) LLMs represent numbers on a helix? And use trigonometry to do addition? Answers below 🧵

WandaMacPherson

@yFBM6BVj6t2vQ9x

Jael

@weingart6251893

ΛRobΛ

@QmniFold

Anabella

@Rauku423

Soria Jules

@jsor_ia

Surya Sree

@suryasreerm

Jiawei Zhang

@jiaweiz_7

Michael Pavlik

@MichaelPavlik5

Elin

@Awrini145

nor

@norxornor

Maureen

@dv2o2va5s64Oj5

Esrieaulas

@Esrieaulas536

John Doe

@Brain_n_Music

Gosjt

@Breijit1800

sudo

@sudo_silly

Rana

@roy_rudradev

Jourdain-Alexander Casale

@headertag

Allegra

@Ufaoukal646095

Rua

@ruasnv

Jeevesh Juneja

@xdfbhkl

mechanistic

@mechaanistic

云创兽Ai

@Frirdea675

Qofer

@Qofer6125

Souvik Bhattacharya

@souvikb1812

Jabir

@jabstec

Amarendhar Reddy

@AmarendharRed17

Charlotte Yan

@CharlotteY5051

Amour Aime

@AmourAime43796

Shin ✈️ ICML

@shinshin_oob

Jack Jingyu Zhang

@jackjingyuzhang

Zorothepirate_

@younghoax20

vm

@thevedantmisra

kyokopom

@Kyokopom

Manan Gupta

@mgisabsolute

Zeyneb Kaya

@zeynebnkaya

Zubair Latifullah

@ZubairLatifulla

paulo henrique

@Paulo96005492

MrPizza Farmer (option_f4)

@F4Option

srija

@liltigertwts

Reuben Narad

@ReubenNarad

Ausτin McCaffrey

@Austin_Aligned

David Reber

@davidpreber

Nicole

@Nicole2590213

Kim Borgen (eu/acc)

@kim_borgen

Arihant

@k5sko

unruly abstractions

@unrulyabstract

Tomasz Sternal

@TomaszSternal

BigData RPG

@BoydSorratat

🌲ぼんさい🌳

@v0n5ai

Abdulaziz

@_SoloIdentity

Jenn Wortman Vaughan

@jennwvaughan

Jiawei Zhang

@jiaweiz_7

lily (xiaoqing)

@lilysun004

Shi Feng

@ihsgnef

Michael Sun

@asianmathnerd

Ryan Greenblatt

@RyanPGreenblatt

CLS

@ChengleiSi

Will Kirby

@wk1rby

Arnab Sen Sharma

@arnab_api

Chuang Gan

@gan_chuang

Todor Mihaylov

@tbmihaylov

Google DeepMind

@GoogleDeepMind

Nina

@NinaPanickssery

Cheuk Hei Chu

@CheukHeiChu

ICLR 2026

@iclr_conf

Tony Chen

@tonychenxyz

Chloe Loughridge

@ChloeLough333

Omar Khattab

@lateinteraction

Xiang Fu

@xiangfu_ml

Wendy Sun

@wendy_sunq

Anish Mudide

@amudide

Josh Engels

@JoshAEngels

Subhash Kantamneni

@thesubhashk

Wes Gurnee

@wesg52

OpenAI

@OpenAI

Elon Musk

@elonmusk

Sam Altman

@sama

Vedang Lad

@vedanglad

Isaac Liao

@LiaoIsaac91893

Eric J. Michaud

@ericjmichaud_

Carl Guo

@CarlGuo866

Ziming Liu

@ZimingLiu11

Max Tegmark

@tegmark

United States Trends

1. GTA 6 62.5K posts
2. GTA VI 22.1K posts
3. Rockstar 52.9K posts
4. Antonio Brown 6,357 posts
5. GTA 5 8,856 posts
6. Nancy Pelosi 131K posts
7. Ozempic 19.7K posts
8. Paul DePodesta 2,245 posts
9. Rockies 4,253 posts
10. Free AB N/A
11. #LOUDERTHANEVER 1,521 posts
12. Elon 400K posts
13. Grisham 1,884 posts
14. Silver Slugger 4,496 posts
15. GTA 7 1,393 posts
16. Kanye 26.5K posts
17. Justin Dean 1,990 posts
18. Fickell 1,148 posts
19. #TNFonPrime 1,409 posts
20. Grand Theft Auto VI 45.6K posts

Something went wrong.

Something went wrong.