burny_tech's profile picture. On the quest to understand the fundamental mathematics of intelligence and of the universe with curiosity. http://burnyverse.com Upskilling @StanfordOnline

Burny - Effective Curiosity

@burny_tech

On the quest to understand the fundamental mathematics of intelligence and of the universe with curiosity. http://burnyverse.com Upskilling @StanfordOnline

Pinned

Hey! Follow me for explorations of intelligence, mathematics, science, engineering, technology, artificial intelligence, machine learning, physics, computer science, (not only computational) neuroscience, cognitive science, transhumanism, AI engineering, AI's benefits, risks,…

burny_tech's tweet image. Hey! Follow me for explorations of intelligence, mathematics, science, engineering, technology, artificial intelligence, machine learning, physics, computer science, (not only computational) neuroscience, cognitive science, transhumanism, AI engineering, AI's benefits, risks,…
burny_tech's tweet image. Hey! Follow me for explorations of intelligence, mathematics, science, engineering, technology, artificial intelligence, machine learning, physics, computer science, (not only computational) neuroscience, cognitive science, transhumanism, AI engineering, AI's benefits, risks,…

Burny - Effective Curiosity reposted

Hi new followers! I’m a mathematician at Harvard. I have a YouTube channel, where I discuss math the way I think about it, with now two playlists on differential geometry and complex geometry (in progress). Comments welcome! DG: youtube.com/watch?v=rVTN7V… CG: youtube.com/watch?v=N5FQHg…

Saman_Habibi_E's tweet image. Hi new followers! I’m a mathematician at Harvard. I have a YouTube channel, where I discuss math the way I think about it, with now two playlists on differential geometry and complex geometry (in progress). Comments welcome!
DG: youtube.com/watch?v=rVTN7V…
CG: youtube.com/watch?v=N5FQHg…
Saman_Habibi_E's tweet image. Hi new followers! I’m a mathematician at Harvard. I have a YouTube channel, where I discuss math the way I think about it, with now two playlists on differential geometry and complex geometry (in progress). Comments welcome!
DG: youtube.com/watch?v=rVTN7V…
CG: youtube.com/watch?v=N5FQHg…

Burny - Effective Curiosity reposted

An interesting property of ARC 3 is that it is more accessible to children than ARC 1 & 2, while being much more difficult for current AI systems


Burny - Effective Curiosity reposted

2024 evals can it count letters 🥺 can it do college stuff 🤓 are its solutions diverse 👉👈 2025 evals has it worked for 30 hours yet 🦾 has it increased gdp 📈 has it discovered novel math 🧮


Burny - Effective Curiosity reposted

/Humanitys-Last-Exam/ ├─ Humanity_Last_Exam.docx ├─ Humanity_Last_Exam_final.docx ├─ Humanity_Last_Exam_FINAL.docx ├─ Humanity_Last_Exam_FINAL_FINAL.docx ├─ Humanity_Last_Exam_REAL_FINAL.docx ├─ Humanity_Last_Exam_REAL_FINAL_v2.docx ├─…


Burny - Effective Curiosity reposted

We recently wrote that GPT-5 is likely the first mainline GPT release to be trained on less compute than its predecessor. How did we reach this conclusion, and what do we actually know about how GPT-5 was trained? 🧵

EpochAIResearch's tweet image. We recently wrote that GPT-5 is likely the first mainline GPT release to be trained on less compute than its predecessor.

How did we reach this conclusion, and what do we actually know about how GPT-5 was trained?
🧵

Burny - Effective Curiosity reposted

sota on arc-agi-1 and -2, sota on artifical intelligence (composite), blows everything (including 4.5 sonnet) away on METR task length... it might not be forever, and it might not be for your use case, but gpt-5 is the world's best overall model; any other claim is cope

aidan_mclau's tweet image. sota on arc-agi-1 and -2, sota on artifical intelligence (composite), blows everything (including 4.5 sonnet) away on METR task length...

it might not be forever, and it might not be for your use case, but gpt-5 is the world's best overall model; any other claim is cope
aidan_mclau's tweet image. sota on arc-agi-1 and -2, sota on artifical intelligence (composite), blows everything (including 4.5 sonnet) away on METR task length...

it might not be forever, and it might not be for your use case, but gpt-5 is the world's best overall model; any other claim is cope
aidan_mclau's tweet image. sota on arc-agi-1 and -2, sota on artifical intelligence (composite), blows everything (including 4.5 sonnet) away on METR task length...

it might not be forever, and it might not be for your use case, but gpt-5 is the world's best overall model; any other claim is cope

Burny - Effective Curiosity reposted

A calculation by the late physicist Freeman Dyson suggested that no plausible experiment could be conducted to confirm the existence of gravitons, the hypothetical particles of gravity. A new proposal overturns that conventional wisdom. quantamagazine.org/it-might-be-po…

QuantaMagazine's tweet image. A calculation by the late physicist Freeman Dyson suggested that no plausible experiment could be conducted to confirm the existence of gravitons, the hypothetical particles of gravity. A new proposal overturns that conventional wisdom. quantamagazine.org/it-might-be-po…

Burny - Effective Curiosity reposted

One AI year is seven Internet ones.

pmddomingos's tweet image. One AI year is seven Internet ones.

Burny - Effective Curiosity reposted

Official METR results for Claude 4.5 Sonnet it doesn't beat GPT-5 with 80% success rate it is even below o3, 4 and 4.1 Opus

scaling01's tweet image. Official METR results for Claude 4.5 Sonnet

it doesn't beat GPT-5
with 80% success rate it is even below o3, 4 and 4.1 Opus

We estimate that Claude Sonnet 4.5 has a 50%-time-horizon of around 1 hr 53 min (95% confidence interval of 50 to 235 minutes) on our agentic multi-step software engineering tasks. This estimate is lower than the current highest time-horizon point estimate of around 2 hr 15 min.

METR_Evals's tweet image. We estimate that Claude Sonnet 4.5 has a 50%-time-horizon of around 1 hr 53 min (95% confidence interval of 50 to 235 minutes) on our agentic multi-step software engineering tasks. This estimate is lower than the current highest time-horizon point estimate of around 2 hr 15 min.


Burny - Effective Curiosity reposted

We estimate that Claude Sonnet 4.5 has a 50%-time-horizon of around 1 hr 53 min (95% confidence interval of 50 to 235 minutes) on our agentic multi-step software engineering tasks. This estimate is lower than the current highest time-horizon point estimate of around 2 hr 15 min.

METR_Evals's tweet image. We estimate that Claude Sonnet 4.5 has a 50%-time-horizon of around 1 hr 53 min (95% confidence interval of 50 to 235 minutes) on our agentic multi-step software engineering tasks. This estimate is lower than the current highest time-horizon point estimate of around 2 hr 15 min.

burny_tech's tweet image.

2.5 years of AI progress Modelscope (left) Grok Imagine 0.9 (right)



Burny - Effective Curiosity reposted

New ARC-AGI SOTA: GPT-5 Pro - ARC-AGI-1: 70.2%, $4.78/task - ARC-AGI-2: 18.3%, $7.41/task @OpenAI’s GPT-5 Pro now holds the highest verified frontier LLM score on ARC-AGI’s Semi-Private benchmark

arcprize's tweet image. New ARC-AGI SOTA: GPT-5 Pro

 - ARC-AGI-1: 70.2%, $4.78/task
 - ARC-AGI-2: 18.3%, $7.41/task

@OpenAI’s GPT-5 Pro now holds the highest verified frontier LLM score on ARC-AGI’s Semi-Private benchmark
arcprize's tweet image. New ARC-AGI SOTA: GPT-5 Pro

 - ARC-AGI-1: 70.2%, $4.78/task
 - ARC-AGI-2: 18.3%, $7.41/task

@OpenAI’s GPT-5 Pro now holds the highest verified frontier LLM score on ARC-AGI’s Semi-Private benchmark

Now didnt expect such jump from non pro version?

New ARC-AGI SOTA: GPT-5 Pro - ARC-AGI-1: 70.2%, $4.78/task - ARC-AGI-2: 18.3%, $7.41/task @OpenAI’s GPT-5 Pro now holds the highest verified frontier LLM score on ARC-AGI’s Semi-Private benchmark

arcprize's tweet image. New ARC-AGI SOTA: GPT-5 Pro

 - ARC-AGI-1: 70.2%, $4.78/task
 - ARC-AGI-2: 18.3%, $7.41/task

@OpenAI’s GPT-5 Pro now holds the highest verified frontier LLM score on ARC-AGI’s Semi-Private benchmark
arcprize's tweet image. New ARC-AGI SOTA: GPT-5 Pro

 - ARC-AGI-1: 70.2%, $4.78/task
 - ARC-AGI-2: 18.3%, $7.41/task

@OpenAI’s GPT-5 Pro now holds the highest verified frontier LLM score on ARC-AGI’s Semi-Private benchmark


Burny - Effective Curiosity reposted

A group of physicists say they know the entropy of what is causing gravity. youtube.com/watch?v=qNt2bh…

skdh's tweet card. This Paper Might Change How We See Gravity

youtube.com

YouTube

This Paper Might Change How We See Gravity


Burny - Effective Curiosity reposted

Yann LeCun's team is continuously advancing JEPA. Their new study reveals that the anti-collapse term in Joint Embedding Predictive Architectures (JEPAs) does more than just prevent trivial representations — it implicitly estimates data density. This means any trained JEPA…

jiqizhixin's tweet image. Yann LeCun's team is continuously advancing JEPA.

Their new study reveals that the anti-collapse term in Joint Embedding Predictive Architectures (JEPAs) does more than just prevent trivial representations — it implicitly estimates data density.

This means any trained JEPA…

Burny - Effective Curiosity reposted

Evolution Strategies can be applied at scale to fine-tune LLMs, and outperforms PPO and GRPO in many model settings! Fantastic paper “Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning” by @yule_gan, Risto Miikkulainen and team. arxiv.org/abs/2509.24372

Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for…



Burny - Effective Curiosity reposted

Congratulations to Michel Devoret, Google Quantum AI’s Chief Scientist of Quantum Hardware, who was awarded the 2025 Nobel Prize in Physics today. Google now has five Nobel laureates among our ranks, including three prizes in the past two years. goo.gle/46EwKaG


Burny - Effective Curiosity reposted

🧠How can LLMs self-evolve over time? They need memory. LLMs burn huge compute on each query and forget everything afterward. ArcMemo introduces abstraction memory, which stores reusable reasoning patterns and recombines them to strengthen compositional reasoning. 📈On…

Lianhuiq's tweet image. 🧠How can LLMs self-evolve over time? They need memory.

LLMs burn huge compute on each query and forget everything afterward. 

ArcMemo introduces abstraction memory, which stores reusable reasoning patterns and recombines them to strengthen compositional reasoning.

📈On…

ArcMemo yields +7.5% relative on ARC-AGI vs o4-mini (same backbone). It extends the LLM idea of “compressing knowledge for generalization” into a lightweight, continually learnable abstract memory—model-agnostic and text-based. Preprint: Lifelong LM Learning via Abstract Memory

matt_seb_ho's tweet image. ArcMemo yields +7.5% relative on ARC-AGI vs o4-mini (same backbone).

It extends the LLM idea of “compressing knowledge for generalization” into a lightweight, continually learnable abstract memory—model-agnostic and text-based.

Preprint: Lifelong LM Learning via Abstract Memory


Burny - Effective Curiosity reposted

Impressive work.

My brain broke when I read this paper. A tiny 7 Million parameter model just beat DeepSeek-R1, Gemini 2.5 pro, and o3-mini at reasoning on both ARG-AGI 1 and ARC-AGI 2. It's called Tiny Recursive Model (TRM) from Samsung. How can a model 10,000x smaller be smarter? Here's how…

JacksonAtkinsX's tweet image. My brain broke when I read this paper.

A tiny 7 Million parameter model just beat DeepSeek-R1, Gemini 2.5 pro, and o3-mini at reasoning on both ARG-AGI 1 and ARC-AGI 2.

It's called Tiny Recursive Model (TRM) from Samsung.

How can a model 10,000x smaller be smarter?

Here's how…


Burny - Effective Curiosity reposted

the new ai benchmark next year will be "when you ask a model to make you a $1b arr saas, how much money actually shows up in your bank account"


Loading...

Something went wrong.


Something went wrong.