jacob_pfau's profile picture. Alignment at UKAISI and PhD student at NYU

Jacob Pfau

@jacob_pfau

Alignment at UKAISI and PhD student at NYU

Jacob Pfau reposted

🧵 New paper from @AISecurityInst x @AiEleuther that I led with Kyle O’Brien: Open-weight LLM safety is both important & neglected. But we show that filtering dual-use knowledge from pre-training data improves tamper resistance *>10x* over post-training baselines.

StephenLCasper's tweet image. 🧵 New paper from @AISecurityInst x @AiEleuther that I led with Kyle O’Brien:

Open-weight LLM safety is both important & neglected. But we show that filtering dual-use knowledge from pre-training data improves tamper resistance *>10x* over post-training baselines.

Jacob Pfau reposted

I am very excited that AISI is announcing over £15M in funding for AI alignment and control, in partnership with other governments, industry, VCs, and philanthropists! Here is a 🧵 about why it is important to bring more independent ideas and expertise into this space.

📢Introducing the Alignment Project: A new fund for research on urgent challenges in AI alignment and control, backed by over £15 million. ▶️ Up to £1 million per project ▶️ Compute access, venture capital investment, and expert support Learn more and apply ⬇️



Jacob Pfau reposted

Short background note about relativisation in debate protocols: if we want to model AI training protocols, we need results that hold even if our source of truth (humans for instance) is a black box that can't be introspected. 🧵

geoffreyirving's tweet image. Short background note about relativisation in debate protocols: if we want to model AI training protocols, we need results that hold even if our source of truth (humans for instance) is a black box that can't be introspected. 🧵

Loading...

Something went wrong.


Something went wrong.