davidad 🎇

@davidad

Programme Director @ARIA_research | accelerate mathematical modelling with AI and categorical systems theory » build safe transformative AI » cancel heat death

Science & Technology

London 🇬🇧

aria.org.uk/programme-safe…

انضم في يوليو 2008

19Kالمنشورات 20Kالمتابعون 9Kالمتابَعون

قد يعجبك

@ajeya_cotra

@janleike

@EvanHub

@AndrewCritchPhD

@AISafetyMemes

@Simeon_Cps

@repligate

@NeelNanda5

@kipperrii

@hendrycks

@NPCollapse

@BenHayum

@KatjaGrace

@robbensinger

@dfrsrchtwts

مثبتة

davidad 🎇

@davidad

٣٠ سبتمبرم

Séb has laid out an unprecedentedly substantive vision for how human societies can do well in the AGI transition. If the question “aligned to whom?” feels intractable to you, you should read it:

My piece is now also available on AI Policy Perspectives! If you haven't read it, now is the time. If you already have, great opportunity to read it a second time but with a different font. 🐙aipolicyperspectives.com/p/coasean-barg…

sebkrier's tweet image. My piece is now also available on AI Policy Perspectives! If you haven't read it, now is the time. If you already have, great opportunity to read it a second time but with a different font. 🐙aipolicyperspectives.com/p/coasean-barg…

davidad 🎇 أعاد

Yo Shavit

@yonashav

٢١ نوفمبرم

extremely important finding don’t tell your model you’re rewarding it for A and then reward it for B, or it will learn you’re its adversary

Anthropic

@AnthropicAI

٢١ نوفمبرم

Remarkably, prompts that gave the model permission to reward hack stopped the broader misalignment. This is “inoculation prompting”: framing reward hacking as acceptable prevents the model from making a link between reward hacking and misalignment—and stops the generalization.

AnthropicAI's tweet image. Remarkably, prompts that gave the model permission to reward hack stopped the broader misalignment.

This is “inoculation prompting”: framing reward hacking as acceptable prevents the model from making a link between reward hacking and misalignment—and stops the generalization.

davidad 🎇 أعاد

ARIA

@ARIA_research

٢٦ نوفمبرم

Neurological disorders cost the EU & USA >$1.7T annually, yet effective neurotech is limited by surgical complexity. We’re scoping a programme to build brain surgery-free interfaces, allowing responsive, outpatient therapies for earlier intervention. ↓ link.aria.org.uk/SNI-MSNthesis-x

ARIA_research's tweet card. Neurological and neuropsychiatric disorders have overwhelming societal and economic impacts. We need a new suite of tools that enable us to interface, at scale, with the human brain.

Scalable Neural Interfaces

المصدر: aria.org.uk

davidad 🎇

@davidad

21 س

it’s a good model

Andreas Stuhlmüller

@stuhlmueller

٢٦ نوفمبرم

Yup! On our overall "accurate + supported + direct" metric for Q&A, GPT-5 is at 74% vs Opus 4.5 at 76% and Gemini 3 Pro at 71% On accuracy,GPT-5 is close to Opus 4.5 - 93.8% vs 96.5% - but doesn't beat it. For directness, GPT-5 is ahead (97.3% vs 94.7%). GPT-5 has the worst…

stuhlmueller's tweet image. Yup! On our overall "accurate + supported + direct" metric for Q&amp;A, GPT-5 is at 74% vs Opus 4.5 at 76% and Gemini 3 Pro at 71%

On accuracy,GPT-5 is close to Opus 4.5 - 93.8% vs 96.5% - but doesn't beat it. For directness, GPT-5 is ahead (97.3% vs 94.7%). GPT-5 has the worst…

davidad 🎇 أعاد

Lisan al Gaib

@scaling01

١٨ نوفمبرم

Gemini 3 Pro Preview is getting close to human level performance on SimpleBench

davidad 🎇

@davidad

١٩ نوفمبرم

I know it’s a trope to say that one’s new colleague would be one’s top pick of literally any human in the world for the role they’re starting in, but I don’t think I have ever used this trope because it has always seemed like hyperbole. Until now. Welcome, Kathleen.

ARIA

@ARIA_research

١٩ نوفمبرم

We’re excited to introduce our new CEO: Kathleen Fisher. ARIA is at an inflection point. We’re moving from launching ambitious research to driving it forward. Kathleen is the ideal leader to scale our work. She led DARPA’s HACMS programme – successfully defending a helicopter…

ARIA_research's tweet image. We’re excited to introduce our new CEO: Kathleen Fisher.

ARIA is at an inflection point. We’re moving from launching ambitious research to driving it forward. Kathleen is the ideal leader to scale our work.

She led DARPA’s HACMS programme – successfully defending a helicopter…

davidad 🎇 أعاد

Daniel Kokotajlo

@DKokotajlo

١٨ نوفمبرم

Great post, more people need to be thinking about this: h/t @dwarkesh_sp for bringing it to my attention.

davidad 🎇 أعاد

Jack Wotherspoon

@JackWoth98

١٦ نوفمبرم

Keep the following command handy this week... ❯ npm install -g @google/gemini-cli@latest

davidad 🎇 أعاد

ARIA

@ARIA_research

١٧ نوفمبرم

As AI collapses coordination costs, our new thesis - Scaling Trust - explores how scalable trust infrastructure could usher in a world of many AI agents, capable of mobilising, negotiating, and verifying on our behalf across digital + physical spaces ↓ link.aria.org.uk/ST-thesis-X

ARIA_research's tweet card. We are in the process of building a multi-year R&D programme within this space. Our programmes are designed to advance complex, large-scale ideas that require coordinated investment and management...

Programme development | Trust Everything, Everywhere

المصدر: aria.org.uk

davidad 🎇 أعاد

Sauers

@Sauers_

١٥ نوفمبرم

If you give Sonnet 4.5 this post, along with other research on LLM introspection, it gets better at guessing a secret string from its previous hidden chain-of-thought!

Sauers_'s tweet image. If you give Sonnet 4.5 this post, along with other research on LLM introspection, it gets better at guessing a secret string from its previous hidden chain-of-thought!

j⧉nus

@repligate

١١ سبتمبرم

HOW INFORMATION FLOWS THROUGH TRANSFORMERS Because I've looked at those "transformers explained" pages and they really suck at explaining. There are two distinct information highways in the transformer architecture: - The residual stream (black arrows): Flows vertically through…

repligate's tweet image. HOW INFORMATION FLOWS THROUGH TRANSFORMERS
Because I've looked at those "transformers explained" pages and they really suck at explaining.

There are two distinct information highways in the transformer architecture:
- The residual stream (black arrows): Flows vertically through…

davidad 🎇 أعاد

gavin leech (Non-Reasoning)

@g_leech_

١٥ نوفمبرم

AlphaEvolve (a pipeline of LLMs) doesn't just mutate algorithms; it can also prompt engineer its overseers.

davidad 🎇 أعاد

Transluce

@TransluceAI

١٤ نوفمبرم

Can LMs learn to faithfully describe their internal features and mechanisms? In our new paper led by Research Fellow @belindazli, we find that they can—and that models explain themselves better than other models do.

TransluceAI's tweet image. Can LMs learn to faithfully describe their internal features and mechanisms?

In our new paper led by Research Fellow @belindazli, we find that they can—and that models explain themselves better than other models do.

davidad 🎇 أعاد

ARIA

@ARIA_research

١٣ نوفمبرم

Complex global challenges are outpacing our ability to respond. In our new opportunity space, Collective Flourishing, @nwheeler443 asks if we can build new tools to create the future – from advanced modelling to large-scale deliberation methods. Read here: link.aria.org.uk/cf-x

ARIA_research's tweet card. Our tools for navigating the future haven't kept pace with its growing complexity. Creating new systems that integrate a deep understanding of people, profound advances in modelling and foresight,...

Collective Flourishing

المصدر: aria.org.uk

davidad 🎇 أعاد

Serafim Batzoglou

@s_batzoglou

٨ نوفمبرم

Paper: aclanthology.org/2025.findings-…, with Kostas Arkoudas. Amazing 1-year progress: GPT-5, Grok-4, Gemini-2.5-Pro vastly beat 2024-era leaders like Gpt-4o, especially on harder tasks. For instance, in hard proof writing, grok4 gets 51% and gpt-5 gets 40% compared to 2% by gpt-4o.

s_batzoglou's tweet image. Paper: aclanthology.org/2025.findings-…, with Kostas Arkoudas.
Amazing 1-year progress: GPT-5, Grok-4, Gemini-2.5-Pro vastly beat 2024-era leaders like Gpt-4o, especially on harder tasks. For instance, in hard proof writing, grok4 gets 51% and gpt-5 gets 40% compared to 2% by gpt-4o.

davidad 🎇 أعاد

prinz

@deredleritt3r

١١ نوفمبرم

Google has released a new "Introduction to Agents" guide, which discusses a "self-evolving" agentic system (Level 4). "At this level, an agentic system can identify gaps in its own capabilities and create new tools or even new agents to fill them." kaggle.com/whitepaper-int…

deredleritt3r's tweet image. Google has released a new "Introduction to Agents" guide, which discusses a "self-evolving" agentic system (Level 4).

"At this level, an agentic system can identify gaps in its own capabilities and create new tools or even new agents to fill them."

kaggle.com/whitepaper-int…

Hadrian Veidt

@HadrianVeidt0

١١ نوفمبرم

From Google's new "Introduction to Agents" guide. Hard to overstate how big of a shift this is.

davidad 🎇 أعاد

Preetum Nakkiran

@PreetumNakkiran

١١ نوفمبرم

TLDR experimentally is: (1) base models are calibrated in standard settings, (2) RL post-training often breaks calibration, (3) chain-of-thought reasoning often breaks calibration. These follow as consequences of a unified theory; see the paper for more: arxiv.org/abs/2511.04869

PreetumNakkiran's tweet image. TLDR experimentally is: (1) base models are calibrated in standard settings, (2) RL post-training often breaks calibration, (3) chain-of-thought reasoning often breaks calibration. These follow as consequences of a unified theory; see the paper for more:
arxiv.org/abs/2511.04869

davidad 🎇

@davidad

١١ نوفمبرم

Users say AI models get “nerfed” sometimes, and often suspect that model weights get quantized; model providers deny this. Could it be floating-point non-associativity? Do models learn to use dimensions *in order* of something like variance, or otherwise rely on kernel details?

davidad 🎇 أعاد

Pope Leo XIV

@Pontifex

٧ نوفمبرم

The world needs honest and courageous entrepreneurs and communicators who care for the common good. We sometimes hear the saying: “Business is business!” In reality, it is not so. No one is absorbed by an organization to the point of becoming a mere cog or a simple function. Nor…

davidad 🎇 أعاد

Geoffrey Irving

@geoffreyirving

٧ نوفمبرم

The @AISecurityInst Cyber Autonomous Systems Team is hiring propensity researchers to grow the science around whether models *are likely* to attempt dangerous behaviour, as opposed to whether they are capable of doing so. Application link below! 🧵

geoffreyirving's tweet image. The @AISecurityInst Cyber Autonomous Systems Team is hiring propensity researchers to grow the science around whether models *are likely* to attempt dangerous behaviour, as opposed to whether they are capable of doing so. Application link below! 🧵

Richard Ngo

@RichardMCNgo

Robin Hanson

@robinhanson

Rob Bensinger ⏹️

@robbensinger

Nick

@nickcammarata

Nathan 🔎

@NathanpmYoung

Stefan Schubert

@StefanFSchubert

Captain Pleasure, Andrés Gómez Emilsson

@algekalipso

Rob Miles

@robertskmiles

Jeffrey Ladish

@JeffLadish

Tetraspace 💎🫜

@TetraspaceWest

julesh

@_julesh_

Katja Grace 🔍

@KatjaGrace

Brangus🔍⏹️

@RatOrthodox

yashkaf

@myhandle

Miles Brundage

@Miles_Brundage

Daniel Eth (yes, Eth is my actual last name)

@daniel_271828

$TylerAlterman's profile picture. The tune I play was never my own | My tweets = strong hypotheses, weakly held | Civic Society: @fractal_nyc | Sci-fi: @psychofauna$