thebasepoint's profile picture. trying to understand evolved systems (🖥  and 🧬)
interpretability research @anthropicai
formerly @czbiohub, @mit math

Joshua Batson

@thebasepoint

trying to understand evolved systems (🖥 and 🧬) interpretability research @anthropicai formerly @czbiohub, @mit math

Joshua Batson 已轉發

3->5, 4->6, 9→11, 7-> ? LLMs solve this via In-Context Learning (ICL); but how is ICL represented and transmitted in LLMs? We build new tools identifying “extractor” and “aggregator” subspaces for ICL, and use them to understand ICL addition tasks like above. Come to…

xyVickyHu's tweet image. 3->5, 4->6, 9→11, 7-> ?
LLMs solve this via In-Context Learning (ICL); but how is ICL represented and transmitted in LLMs? We build new tools identifying “extractor” and “aggregator” subspaces for ICL, and use them to understand ICL addition tasks like above. Come to…

This was so cool to be a part of. Jack led an incredible effort to quickly analyze the internals of a new model, as versions were coming in, to assess alignment. Research at the speed of model development.

Prior to the release of Claude Sonnet 4.5, we conducted a white-box audit of the model, applying interpretability techniques to “read the model’s mind” in order to validate its reliability and alignment. This was the first such audit on a frontier LLM, to our knowledge. (1/15)

Jack_W_Lindsey's tweet image. Prior to the release of Claude Sonnet 4.5, we conducted a white-box audit of the model, applying interpretability techniques to “read the model’s mind” in order to validate its reliability and alignment. This was the first such audit on a frontier LLM, to our knowledge. (1/15)


Joshua Batson 已轉發

Prior to the release of Claude Sonnet 4.5, we conducted a white-box audit of the model, applying interpretability techniques to “read the model’s mind” in order to validate its reliability and alignment. This was the first such audit on a frontier LLM, to our knowledge. (1/15)

Jack_W_Lindsey's tweet image. Prior to the release of Claude Sonnet 4.5, we conducted a white-box audit of the model, applying interpretability techniques to “read the model’s mind” in order to validate its reliability and alignment. This was the first such audit on a frontier LLM, to our knowledge. (1/15)

Joshua Batson 已轉發

We asked every version of Claude to make a clone of Claude(dot)ai, including today’s Sonnet 4.5… see what happened in the video


Joshua Batson 已轉發

We’re hiring someone to run the Anthropic Fellows Program! Our research collaborations have led to some of our best safety research and hires. We’re looking for an exceptional ops generalist, TPM, or research/eng manager to help us significantly scale and improve our collabs 🧵


Joshua Batson 已轉發

Arc Institute trained their foundation model Evo 2 on DNA from all domains of life. What has it learned about the natural world? Our new research finds that it represents the tree of life, spanning thousands of species, as a curved manifold in its neuronal activations. (1/8)

GoodfireAI's tweet image. Arc Institute trained their foundation model Evo 2 on DNA from all domains of life. What has it learned about the natural world?
Our new research finds that it represents the tree of life, spanning thousands of species, as a curved manifold in its neuronal activations. (1/8)

Joshua Batson 已轉發

Join Anthropic interpretability researchers @thebasepoint, @mlpowered, and @Jack_W_Lindsey as they discuss looking into the mind of an AI model - and why it matters:


Joshua Batson 已轉發

We’re running another round of the Anthropic Fellows program. If you're an engineer or researcher with a strong coding or technical background, you can apply to receive funding, compute, and mentorship from Anthropic, beginning this October. There'll be around 32 places.

AnthropicAI's tweet image. We’re running another round of the Anthropic Fellows program. 

If you're an engineer or researcher with a strong coding or technical background, you can apply to receive funding, compute, and mentorship from Anthropic, beginning this October. There'll be around 32 places.

Joshua Batson 已轉發

New research with coauthors at @Anthropic, @GoogleDeepMind, @AiEleuther, and @decode_research! We expand on and open-source Anthropic’s foundational circuit-tracing work. Brief highlights in thread: (1/7)


Loading...

Something went wrong.


Something went wrong.