Joshua Batson

@thebasepoint

trying to understand evolved systems (🖥 and 🧬) interpretability research @anthropicai formerly @czbiohub, @mit math

Oakland, CA

於二月 2012 加入

2千貼文 4千位跟隨者 675個跟隨中

你可能會喜歡

@czbiohub

@JEFworks

@deboramarks

@vallens

@loicaroyer

@JonathanHafetz

@hpke1980

@kieranrcampbell

@dna_rosenberg

@davidvanvalen

@amykczb

Joshua Batson 已轉發

Xinyan Hu @COLM25🇨🇦

@xyVickyHu

年10月10日

3->5, 4->6, 9→11, 7-> ? LLMs solve this via In-Context Learning (ICL); but how is ICL represented and transmitted in LLMs? We build new tools identifying “extractor” and “aggregator” subspaces for ICL, and use them to understand ICL addition tasks like above. Come to…

xyVickyHu's tweet image. 3-&gt;5, 4-&gt;6, 9→11, 7-&gt; ?
LLMs solve this via In-Context Learning (ICL); but how is ICL represented and transmitted in LLMs? We build new tools identifying “extractor” and “aggregator” subspaces for ICL, and use them to understand ICL addition tasks like above. Come to…

Joshua Batson

@thebasepoint

年9月29日

This was so cool to be a part of. Jack led an incredible effort to quickly analyze the internals of a new model, as versions were coming in, to assess alignment. Research at the speed of model development.

Jack Lindsey

@Jack_W_Lindsey

年9月29日

Prior to the release of Claude Sonnet 4.5, we conducted a white-box audit of the model, applying interpretability techniques to “read the model’s mind” in order to validate its reliability and alignment. This was the first such audit on a frontier LLM, to our knowledge. (1/15)

Jack_W_Lindsey's tweet image. Prior to the release of Claude Sonnet 4.5, we conducted a white-box audit of the model, applying interpretability techniques to “read the model’s mind” in order to validate its reliability and alignment. This was the first such audit on a frontier LLM, to our knowledge. (1/15)

Joshua Batson 已轉發

Jack Lindsey

@Jack_W_Lindsey

年9月29日

Joshua Batson 已轉發

Mike Krieger

@mikeyk

年9月29日

We asked every version of Claude to make a clone of Claude(dot)ai, including today’s Sonnet 4.5… see what happened in the video

Joshua Batson 已轉發

Ethan Perez

@EthanJPerez

年9月4日

We’re hiring someone to run the Anthropic Fellows Program! Our research collaborations have led to some of our best safety research and hires. We’re looking for an exceptional ops generalist, TPM, or research/eng manager to help us significantly scale and improve our collabs 🧵

Joshua Batson 已轉發

Goodfire

@GoodfireAI

年8月27日

Arc Institute trained their foundation model Evo 2 on DNA from all domains of life. What has it learned about the natural world? Our new research finds that it represents the tree of life, spanning thousands of species, as a curved manifold in its neuronal activations. (1/8)

GoodfireAI's tweet image. Arc Institute trained their foundation model Evo 2 on DNA from all domains of life. What has it learned about the natural world?
Our new research finds that it represents the tree of life, spanning thousands of species, as a curved manifold in its neuronal activations. (1/8)

Joshua Batson 已轉發

Anthropic

@AnthropicAI

年8月15日

Join Anthropic interpretability researchers @thebasepoint, @mlpowered, and @Jack_W_Lindsey as they discuss looking into the mind of an AI model - and why it matters:

Joshua Batson 已轉發

Anthropic

@AnthropicAI

年7月29日

We’re running another round of the Anthropic Fellows program. If you're an engineer or researcher with a strong coding or technical background, you can apply to receive funding, compute, and mentorship from Anthropic, beginning this October. There'll be around 32 places.

AnthropicAI's tweet image. We’re running another round of the Anthropic Fellows program.

If you're an engineer or researcher with a strong coding or technical background, you can apply to receive funding, compute, and mentorship from Anthropic, beginning this October. There'll be around 32 places.

Joshua Batson 已轉發

Goodfire

@GoodfireAI

年8月5日

New research with coauthors at @Anthropic, @GoogleDeepMind, @AiEleuther, and @decode_research! We expand on and open-source Anthropic’s foundational circuit-tracing work. Brief highlights in thread: (1/7)