bio_bootloader's profile picture. Father of 3, building Mentat (the github native coding agent!) @AbanteAI, prev @DeepMind

Scott Swingle

@bio_bootloader

Father of 3, building Mentat (the github native coding agent!) @AbanteAI, prev @DeepMind

ปักหมุด

if you haven't tried Mentat for a while... give it a spin


join the mentat party now! a new kind of interactive social experience!

Introducing the Mentat Party Agent. It’s a coding agent running in the cloud that anyone can interact with - without even logging in. I started it with a simple web app with a chatroom and snake. What will it become? Up to you. mentat . ai / party x.com/i/broadcasts/1…



Scott Swingle รีโพสต์แล้ว

Party chat showcases so much of what makes Mentat unique - Every chat is a group chat - Long running chats - Great mobile site

Introducing the Mentat Party Agent. It’s a coding agent running in the cloud that anyone can interact with - without even logging in. I started it with a simple web app with a chatroom and snake. What will it become? Up to you. mentat . ai / party x.com/i/broadcasts/1…



Scott Swingle รีโพสต์แล้ว

Introducing the Mentat Party Agent. It’s a coding agent running in the cloud that anyone can interact with - without even logging in. I started it with a simple web app with a chatroom and snake. What will it become? Up to you. mentat . ai / party x.com/i/broadcasts/1…


Scott Swingle รีโพสต์แล้ว

Code Agent SOTA as of Oct 6th 2025: - Codex in Codex CLI is the smartest and best model. Its endurance and thoroughness are insane. Slow - Cheetah in Cursor is the best pair programmer by far. It is so fast you can voice input and get instant, good changes. Bottleneck is…


Sonnet 4.5 beats Sonnet 4 as the best long code context model Both still in a class of their own!

bio_bootloader's tweet image. Sonnet 4.5 beats Sonnet 4 as the best long code context model

Both still in a class of their own!

Scott Swingle รีโพสต์แล้ว

Sonnet 4.5 is live on Mentat now!


> The evaluations we ran simply didn't capture the degradation users were reporting, in part because Claude often recovers well from isolated mistakes. we give Mentat agents an "exit survey" to report harness bugs, even if they found a workaround


it's easy to get to 95% or even 100% of code "written by AI". But that's not the same as a 20x speedup! Complex tasks in real codebases aren't yet even 2-3x sped up yet. It's coming. Saying it's here already is premature.

“For our Claude Code team 95% of the code is written by Claude.” —Anthropic cofounder Benjamin Mann One person can build 20X the code they could before The future is here, just not evenly distributed



Scott Swingle รีโพสต์แล้ว

I wrote a blog post about using the same agent to review and write code

ja3k_'s tweet image. I wrote a blog post about using the same agent to review and write code

solution I came up with: PostToolUse hook, matching Bash, running a script that runs: `gh pr checks --watch --fail-fast` if claude's command contained `git push` or `gh pr create` And if that fails, shows Claude the output makes Claude Code feel a bit more like Mentat

is there a way to make my claude code session wake up when ci fails on github? so I don't have to message it to check ci?



after much testing and tuning, I think this is still the case At least for a coding agent like Mentat, Sonnet handles uncertainty and exploration better, and is more thorough that GPT-5. It's much more expensive and slower though, so there are real tradeoffs

crazy that anthropic has had the best model continuously for over a year now



feels like a lot of these ideas will simply work now that we have powerful in-context learning AlphaGo needed RL because we didn't have a generally intelligent base. Now models can start teaching themselves

bio_bootloader's tweet image. feels like a lot of these ideas will simply work now that we have powerful in-context learning

AlphaGo needed RL because we didn't have a generally intelligent base. Now models can start teaching themselves

In era of pretraining, what mattered was internet text. You'd primarily want a large, diverse, high quality collection of internet documents to learn from. In era of supervised finetuning, it was conversations. Contract workers are hired to create answers for questions, a bit…



after tons of testing with Mentat: Sonnet: Great default behavior, but ignores a lot of prompts. Needs strong pushes to change behavior GPT-5: Super steerable. Actually does what I say! But I have to tell it what to do more Depending on use case each can be good


every day at midnight do all of OpenAI's api prompt caches invalidate simultaneously?

bio_bootloader's tweet image. every day at midnight do all of OpenAI's api prompt caches invalidate simultaneously?

whoa OpenAI is giving GPT-5 on the API a prompt before the one I set?? they give it: - instructions on formatting (which is contradicting my own) - today's date - telling it that it's being used over the API - a bunch of other stuff not cool


Loading...

Something went wrong.


Something went wrong.