daniel_mac8's profile picture. AI Field Engineer @sourcegraph | Writing Token Stream | Goodness, Truth and AI | Building at http://github.com/DannyMac180

Dan Mac

@daniel_mac8

AI Field Engineer @sourcegraph | Writing Token Stream | Goodness, Truth and AI | Building at http://github.com/DannyMac180

置頂

Aristotle, an AI system specialized for Mathematics from @HarmonicMath, solved Erdős problem #481. Days ago the same system solved problem #124. Controversy ensued as #124 was supposedly the "easy" version. #481 is *not* an easy version. Terrence Tao even commented: "Nice!"…

daniel_mac8's tweet image. Aristotle, an AI system specialized for Mathematics from @HarmonicMath, solved Erdős problem #481.

Days ago the same system solved problem #124.

Controversy ensued as #124 was supposedly the "easy" version.

#481 is *not* an easy version.

Terrence Tao even commented: "Nice!"…
daniel_mac8's tweet image. Aristotle, an AI system specialized for Mathematics from @HarmonicMath, solved Erdős problem #481.

Days ago the same system solved problem #124.

Controversy ensued as #124 was supposedly the "easy" version.

#481 is *not* an easy version.

Terrence Tao even commented: "Nice!"…

Anthropic acquires Bun. The strategy is clear: Claude becomes the full stack AI computing platform. > Claude Opus/Sonnet/Haiku = Compute > Claude Code = Orchestration > Bun = Execution Claude isn't just the intelligence. It's the entire AI environment. Claude is a flywheel.

daniel_mac8's tweet image. Anthropic acquires Bun.

The strategy is clear: Claude becomes the full stack AI computing platform.

> Claude Opus/Sonnet/Haiku = Compute
> Claude Code = Orchestration
> Bun = Execution

Claude isn't just the intelligence.
It's the entire AI environment.

Claude is a flywheel.

Anthropic is acquiring @bunjavascript to further accelerate Claude Code’s growth. We're delighted that Bun—which has dramatically improved the JavaScript and TypeScript developer experience—is joining us to make Claude Code even better. Read more: anthropic.com/news/anthropic…



Opus 4.5 is a glorious creature.


If the OG is hyped, I’m hyped. Sam-ta Claus is coming to town.

daniel_mac8's tweet image. If the OG is hyped, I’m hyped.

Sam-ta Claus is coming to town.

“ Altman said OpenAI is planning to ship a new reasoning model next week that is ‘ ahead of [Google’s] Gemini 3 ‘ “

apples_jimmy's tweet image. “ Altman said OpenAI is planning to ship a new reasoning model next week that is  ‘ ahead of [Google’s] Gemini 3 ‘ “


Dan Mac 已轉發

Aristotle, an AI system specialized for Mathematics from @HarmonicMath, solved Erdős problem #481. Days ago the same system solved problem #124. Controversy ensued as #124 was supposedly the "easy" version. #481 is *not* an easy version. Terrence Tao even commented: "Nice!"…

daniel_mac8's tweet image. Aristotle, an AI system specialized for Mathematics from @HarmonicMath, solved Erdős problem #481.

Days ago the same system solved problem #124.

Controversy ensued as #124 was supposedly the "easy" version.

#481 is *not* an easy version.

Terrence Tao even commented: "Nice!"…
daniel_mac8's tweet image. Aristotle, an AI system specialized for Mathematics from @HarmonicMath, solved Erdős problem #481.

Days ago the same system solved problem #124.

Controversy ensued as #124 was supposedly the "easy" version.

#481 is *not* an easy version.

Terrence Tao even commented: "Nice!"…

Honest question: is Grok 4.1 Fast really better than Opus 4.5 at tool calling or is this pure unadulterated benchmaxxing?

daniel_mac8's tweet image. Honest question: is Grok 4.1 Fast really better than Opus 4.5 at tool calling or is this pure unadulterated benchmaxxing?

Very unexpected results! Grok 4.1 Fast Reasoning beats every frontier model in Tau2-Verified! Congrats team! I was certainly not expecting a Fast model to beat @AnthropicAI 's Opus 4.5 in agentic tasks @xai @elonmusk @Yuhu_ai_ Check it out: github.com/amazon-agi/tau…

Alex_Cuadron's tweet image. Very unexpected results! Grok 4.1 Fast Reasoning beats every frontier model in Tau2-Verified!

Congrats team! I was certainly not expecting a Fast model to beat @AnthropicAI 's Opus 4.5 in agentic tasks @xai @elonmusk @Yuhu_ai_ 

Check it out: github.com/amazon-agi/tau…


Sama declares 🔴 Code Red 🔴 at OpenAI. The below chart shows why. For the first time since Nov. '22, OpenAI is falling behind Google and Anthropic on model capability rather than only coding or cost/performance ratio. Don't count OpenAI out of the race just yet though...…

daniel_mac8's tweet image. Sama declares 🔴 Code Red 🔴 at OpenAI.

The below chart shows why.

For the first time since Nov. '22, OpenAI is falling behind Google and Anthropic on model capability rather than only coding or cost/performance ratio.

Don't count OpenAI out of the race just yet though...…
daniel_mac8's tweet image. Sama declares 🔴 Code Red 🔴 at OpenAI.

The below chart shows why.

For the first time since Nov. '22, OpenAI is falling behind Google and Anthropic on model capability rather than only coding or cost/performance ratio.

Don't count OpenAI out of the race just yet though...…
daniel_mac8's tweet image. Sama declares 🔴 Code Red 🔴 at OpenAI.

The below chart shows why.

For the first time since Nov. '22, OpenAI is falling behind Google and Anthropic on model capability rather than only coding or cost/performance ratio.

Don't count OpenAI out of the race just yet though...…

Oh, you think your LLM is bad at instruction following? Try getting your 2.5 year old to go to bed…


Pretty sure David Sacks is safe here. Do you know anyone that regularly reads the New York Times? Not even trying to be a dick. The only regular interaction anyone I know has with NYT is Wordle. Sad because it was the best.

INSIDE NYT’S HOAX FACTORY Five months ago, five New York Times reporters were dispatched to create a story about my supposed conflicts of interest working as the White House AI & Crypto Czar. Through a series of “fact checks” they revealed their accusations, which we debunked…

DavidSacks's tweet image. INSIDE NYT’S HOAX FACTORY 

Five months ago, five New York Times reporters were dispatched to create a story about my supposed conflicts of interest working as the White House AI & Crypto Czar. 

Through a series of “fact checks” they revealed their accusations, which we debunked…
DavidSacks's tweet image. INSIDE NYT’S HOAX FACTORY 

Five months ago, five New York Times reporters were dispatched to create a story about my supposed conflicts of interest working as the White House AI & Crypto Czar. 

Through a series of “fact checks” they revealed their accusations, which we debunked…
DavidSacks's tweet image. INSIDE NYT’S HOAX FACTORY 

Five months ago, five New York Times reporters were dispatched to create a story about my supposed conflicts of interest working as the White House AI & Crypto Czar. 

Through a series of “fact checks” they revealed their accusations, which we debunked…
DavidSacks's tweet image. INSIDE NYT’S HOAX FACTORY 

Five months ago, five New York Times reporters were dispatched to create a story about my supposed conflicts of interest working as the White House AI & Crypto Czar. 

Through a series of “fact checks” they revealed their accusations, which we debunked…


Non-human intelligences creating new knowledge. Pretty, pretty cool.

Claude by @AnthropicAI proved Erdos problem #124 in Lean.



Dan Mac 已轉發

I often get asked “what are the best models”, so I added my current model recommendations list here!

pvncher's tweet image. I often get asked “what are the best models”, so I added my current model recommendations list here!

Finally revamped all the @RepoPrompt docs, making them more approachable and up to date! And, you can now copy all the docs to the clipboard to talk to an LLM about them! It's also the last day of the Black Friday sale! repoprompt.com/docs



Dan Mac 已轉發

Sorry DeepSeek bros, these benchmarks aren’t very impressive. Is DeepSeek still relevant?

daniel_mac8's tweet image. Sorry DeepSeek bros, these benchmarks aren’t very impressive.

Is DeepSeek still relevant?

🚀 Launching DeepSeek-V3.2 & DeepSeek-V3.2-Speciale — Reasoning-first models built for agents! 🔹 DeepSeek-V3.2: Official successor to V3.2-Exp. Now live on App, Web & API. 🔹 DeepSeek-V3.2-Speciale: Pushing the boundaries of reasoning capabilities. API-only for now. 📄 Tech…

deepseek_ai's tweet image. 🚀 Launching DeepSeek-V3.2 & DeepSeek-V3.2-Speciale — Reasoning-first models built for agents!

🔹 DeepSeek-V3.2: Official successor to V3.2-Exp. Now live on App, Web & API.
🔹 DeepSeek-V3.2-Speciale: Pushing the boundaries of reasoning capabilities. API-only for now.

📄 Tech…


Loading...

Something went wrong.


Something went wrong.