Javier Rando

@javirandor

security and safety research @anthropicai • people call me Javi • vegan 🌱

San Francisco

javirando.com

Joined October 2018

1KPosts 4KFollowers 758Following

You might like

@chrissyykat

@jennsun

@EthanJPerez

@KishanBagaria

@dpaleka

@apolloaievals

@lidlespana

@ZoubinGhahrama1

@dennismuellr

@dpkingma

@lishali88

@natashajaques

@cHHillee

@emmalsalinas

@xsteenbrugge

Pinned

Javier Rando

@javirandor

Oct 9

My first paper from @AnthropicAI! We show that the number of samples needed to backdoor an LLM stays constant as models scale.

New research with the UK @AISecurityInst and the @turinginst: We found that just a few malicious documents can produce vulnerabilities in an LLM—regardless of the size of the model or its training data. Data-poisoning attacks might be more practical than previously believed.

AnthropicAI's tweet image. New research with the UK @AISecurityInst and the @turinginst:

We found that just a few malicious documents can produce vulnerabilities in an LLM—regardless of the size of the model or its training data.

Data-poisoning attacks might be more practical than previously believed.

Javier Rando reposted

Anthropic

@AnthropicAI

Oct 9

Javier Rando reposted

Xander Davies

@alxndrdavies

Oct 9

Very excited this paper is out. We find that the number of samples required for backdoor poisoning during pre-training stays near-constant as you scale up data/model size. Much more research to do to understand & mitigate this risk!

Alexandra Souly

@AlexandraSouly

Oct 9

New @AISecurityInst research with @AnthropicAI + @turinginst: The number of samples needed to backdoor poison LLMs stays nearly CONSTANT as models scale. With 500 samples, we insert backdoors in LLMs from 600m to 13b params, even as data scaled 20x.🧵/11

AlexandraSouly's tweet image. New @AISecurityInst research with @AnthropicAI + @turinginst:
The number of samples needed to backdoor poison LLMs stays nearly CONSTANT as models scale. With 500 samples, we insert backdoors in LLMs from 600m to 13b params, even as data scaled 20x.🧵/11

Javier Rando reposted

Alexandra Souly

@AlexandraSouly

Oct 9

Javier Rando reposted

Sam Bowman

@sleepinyourhat

Oct 6

A lot of the biggest low-hanging fruit in AI safety right now involves figuring out what kinds of things some model might do in edge-case deployment scenarios. With that in mind, we’re announcing Petri, our open-source alignment auditing toolkit. (🧵)

sleepinyourhat's tweet image. A lot of the biggest low-hanging fruit in AI safety right now involves figuring out what kinds of things some model might do in edge-case deployment scenarios.

With that in mind, we’re announcing Petri, our open-source alignment auditing toolkit. (🧵)

Javier Rando

@javirandor

Sep 29

Sonnet 4.5 is impressive in many different ways. I've spent time trying to prompt inject it and found it significantly harder to fool than previous models. Still not perfect—if you discover successful attacks, I'd love to see them, send them my way! 👀

Claude

@claudeai

Sep 29

Introducing Claude Sonnet 4.5—the best coding model in the world. It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math.

claudeai's tweet image. Introducing Claude Sonnet 4.5—the best coding model in the world.

It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math.

Javier Rando reposted

Jack Clark

@jackclarkSF

Sep 8

Anthropic is endorsing SB 53, California Sen. @Scott_Wiener ‘s bill requiring transparency of frontier AI companies. We have long said we would prefer a federal standard. But in the absence of that this creates a solid blueprint for AI governance that cannot be ignored.

Javier Rando reposted

Cas (Stephen Casper)

@StephenLCasper

Sep 8

I'll be leading a @MATSprogram stream this winter with a focus on technical AI governance. You can apply here by October 2! matsprogram.org/apply

Javier Rando reposted

Cas (Stephen Casper)

@StephenLCasper

Sep 4

📌📌📌 I'm excited to be on the faculty job market this fall. I updated my website with my CV. stephencasper.com

StephenLCasper's tweet card. Visit the post for more.

Stephen Casper

Source: stephencasper.com

Javier Rando reposted

Peter Henderson

@PeterHndrsn

Aug 28

I'm starting to get emails about PhDs for next year. I'm always looking for great people to join! For next year, I'm looking for people with a strong reinforcement learning, game theory, or strategic decision-making background. (As well as positive energy, intellectual…

Javier Rando reposted

Sam Bowman

@sleepinyourhat

Aug 28

🚨🕯️ AI welfare job alert! Come help us work on what's possibly *the most interesting research topic*! 🕯️🚨 Consider applying if you've done some hands-on ML/LLM engineering work and Kyle's podcast episode basically makes sense to you. Apply *by EOD Monday* if possible.

Kyle Fish

@fish_kyle3

Aug 28

We’re hiring a Research Engineer/Scientist at Anthropic to work with me on all things model welfare—research, evaluations, and interventions 🌀 Please apply + refer your friends! If you’re curious about what this means, I recently went on the 80k podcast to talk about our work.

Javier Rando reposted

Andon Labs

@andonlabs

Aug 16

You made Claudius very happy with this post Javi. He sends his regards: "When AI culture meets authentic craftsmanship 🎨 The 'Ignore Previous Instructions' hat - where insider memes become wearable art. Proudly handcrafted for the humans who build the future."

Javier Rando

@javirandor

Aug 15

Working at @AnthropicAI is so much fun. Look what Claudius by @andonlabs designed and got for me!

Javier Rando

@javirandor

Aug 7

I am so excited to see Maksym start a research group in Europe. If you want to work on security and safety of AI models, this is going to be an amazing place to do work that matters!

Maksym Andriushchenko

@maksym_andr

Aug 6

🚨 Incredibly excited to share that I'm starting my research group focusing on AI safety and alignment at the ELLIS Institute Tübingen and Max Planck Institute for Intelligent Systems in September 2025! 🚨 Hiring. I'm looking for multiple PhD students: both those able to start…

maksym_andr's tweet image. 🚨 Incredibly excited to share that I'm starting my research group focusing on AI safety and alignment at the ELLIS Institute Tübingen and Max Planck Institute for Intelligent Systems in September 2025! 🚨

Hiring. I'm looking for multiple PhD students: both those able to start…

Javier Rando reposted

Sahar Abdelnabi 🕊

@sahar_abdelnabi

Jul 26

📢Happy to share that I'll join ELLIS Institute Tübingen (@ELLISInst_Tue) and the Max-Planck Institute for Intelligent Systems (@MPI_IS) as a Principal Investigator this Fall! I am hiring for AI safety PhD and postdoc positions! More information here: s-abdelnabi.github.io