logangraham's profile picture. make things radically good 🌎 @anthropicai

Logan Graham

@logangraham

make things radically good 🌎 @anthropicai

Good thread of our teams’ work. We wanted to be really earnest and transparent about what we thought seeing the model crush a bunch of our evals. Autonomy, bio reasoning, and cyber skills are coming fast.

Claims in the system card indicate Opus 4.5 as a possibly big jump in AI R&D and coding capabilities, with employees saying it gives them a 2x boost and Opus performing better than humans on 4-8 hour tasks. Looking forward to third-party evals and field reports. More in thread.



3 vignettes from using Opus 4.5 in the past few weeks: 1. It's genuinely funny. Talking to it on Slack, it’s probably a ~85%ile poster in the company. 2. It crosses a writing threshold. I have a really high bar for writing. This is the first model I want to use. 3. I've…

the new vibe:

__nmca__'s tweet image. the new vibe:


I am pretty excited about this. Few missions as exciting as building the technological defense layer for humanity (e.g. bio, cyber) so we can get on with doing wild creative stuff (especially including with bio). Kudos @hannu and team!

Today, we are launching Red Queen Bio (redqueen.bio), an AI biosecurity company, with a $15M seed led by @OpenAI. Biorisk grows exponentially with AI capabilities. Our mission is to scale biological defenses at the same rate. A 🧵 on who we are + what we do!👇🏻1/19



The Frontier Red Team team did a fun experiment! Check out our video. There's a serious point here: we want to know what it's like when models start controlling the physical world.

New Anthropic research: Project Fetch. We asked two teams of Anthropic researchers to program a robot dog. Neither team had any robotics expertise—but we let only one team use Claude. How did they do?



There are few teams as smart, or working as hard at creating AI-for-science, as @SGRodriques and @andrewwhite01 and the others at @EdisonSci. You should probably play with Kosmos.

Today, we’re announcing Kosmos, our newest AI Scientist, available to use now. Users estimate Kosmos does 6 months of work in a single day. One run can read 1,500 papers and write 42,000 lines of code. At least 79% of its findings are reproducible. Kosmos has made 7 discoveries…



Wow! Big if true

We've trained an unsupervised language model that can generate coherent paragraphs and perform rudimentary reading comprehension, machine translation, question answering, and summarization — all without task-specific training: blog.openai.com/better-languag…



After I left govt and before @anthropicai, I explored building a biodefense startup. There are few missions as generationally inspiring. And it was clear the startups were going to be very cool. I'll be following @ValthosTech!

Today, @ValthosTech launches with $30M from @Lux_Capital, @OpenAI, and @foundersfund. For the past few months I’ve been working with their world-class team (@MIT, @GoogleDeepMind, @PalantirTech) building America’s modern next-gen infrastructure. Hiring –– DM me.



“God has a special providence for fools, drunkards, and the United States of America.”

logangraham's tweet image. “God has a special providence for fools, drunkards, and the United States of America.”

Logan Graham أعاد

I guess it's now every day until the end of time

GPT-5 Pro found a counterexample to the NICD-with-erasures majority optimality (Simons list, p.25). simons.berkeley.edu/sites/default/… At p=0.4, n=5, f(x) = sign(x_1-3x_2+x_3-x_4+3x_5) gives E|f(x)|=0.43024 vs best majority 0.42904.

PI010101's tweet image. GPT-5 Pro found a counterexample to the NICD-with-erasures majority optimality (Simons list, p.25).
simons.berkeley.edu/sites/default/…

At p=0.4, n=5, f(x) = sign(x_1-3x_2+x_3-x_4+3x_5) gives E|f(x)|=0.43024 vs best majority 0.42904.
PI010101's tweet image. GPT-5 Pro found a counterexample to the NICD-with-erasures majority optimality (Simons list, p.25).
simons.berkeley.edu/sites/default/…

At p=0.4, n=5, f(x) = sign(x_1-3x_2+x_3-x_4+3x_5) gives E|f(x)|=0.43024 vs best majority 0.42904.


Loading...

Something went wrong.


Something went wrong.