adamrpearce's profile picture. @anthropicai, previously: google brain, @nytgraphics and @bbgvisualdata

Adam Pearce

@adamrpearce

@anthropicai, previously: google brain, @nytgraphics and @bbgvisualdata

Adam Pearce أعاد

New paper! We reverse engineered the mechanisms underlying Claude Haiku’s ability to perform a simple “perceptual” task. We discover beautiful feature families and manifolds, clean geometric transformations, and distributed attention algorithms!

wesg52's tweet image. New paper! We reverse engineered the mechanisms underlying Claude Haiku’s ability to perform a simple “perceptual” task. We discover beautiful feature families and manifolds, clean geometric transformations, and distributed attention algorithms!

Adam Pearce أعاد

Prior to the release of Claude Sonnet 4.5, we conducted a white-box audit of the model, applying interpretability techniques to “read the model’s mind” in order to validate its reliability and alignment. This was the first such audit on a frontier LLM, to our knowledge. (1/15)

Jack_W_Lindsey's tweet image. Prior to the release of Claude Sonnet 4.5, we conducted a white-box audit of the model, applying interpretability techniques to “read the model’s mind” in order to validate its reliability and alignment. This was the first such audit on a frontier LLM, to our knowledge. (1/15)

Adam Pearce أعاد

Earlier this year, we showed a method to interpret the intermediate steps a model takes to produce an answer. But we were missing a key bit of information: explaining why the model attends to specific concepts. Today, we do just that 🧵

mlpowered's tweet image. Earlier this year, we showed a method to interpret the intermediate steps a model takes to produce an answer.

But we were missing a key bit of information: explaining why the model attends to specific concepts.

Today, we do just that 🧵

Adam Pearce أعاد

🧵Can we “ask” an LLM to “translate” its own hidden representations into natural language? We propose 🩺Patchscopes, a new framework for decoding specific information from a representation by “patching” it into a separate inference pass, independently of its original context. 1/9

ghandeharioun's tweet image. 🧵Can we “ask” an LLM to “translate” its own hidden representations into natural language? We propose 🩺Patchscopes, a new framework for decoding specific information from a representation by “patching” it into a separate inference pass, independently of its original context. 1/9

voronoi diagrams showing the regions plot's pointerX and pointerY select blocks.roadtolarissa.com/1wheel/ecd4050…


Confidently Incorrect Models to Humble Ensembles by @Nithum, @balajiln and Jasper Snoek pair.withgoogle.com/explorables/un…


Most machine learning models are trained by collecting vast amounts of data on a central server. @nicki_mitch and I looked at how federated learning makes it possible to train models without any user's raw data leaving their device. pair.withgoogle.com/explorables/fe…


it's not spider-man's fault: why best picture winners aren't hits anymore roadtolarissa.com/box-office-hit…

adamrpearce's tweet image. it's not spider-man's fault: why best picture winners aren't hits anymore

roadtolarissa.com/box-office-hit…
adamrpearce's tweet image. it's not spider-man's fault: why best picture winners aren't hits anymore

roadtolarissa.com/box-office-hit…
adamrpearce's tweet image. it's not spider-man's fault: why best picture winners aren't hits anymore

roadtolarissa.com/box-office-hit…

Loading...

Something went wrong.


Something went wrong.