你可能會喜歡
Today we’re releasing our first public preview of ARC-AGI-3: the first three games. Version 3 is a big upgrade over v1 and v2 which are designed to challenge pure deep learning and static reasoning. In contrast, v3 challenges interactive reasoning (eg. agents). The full version…
I've learned that building useful AI benchmarks has much in common with building useful products. You cannot design either in isolation. Both need strong contact with reality and iteration to make them great.
DNA is sometimes viewed as information compression. What minimal information is needed to create life/intelligence? But DNA is useless without a cell and its machinery. To fully describe life, you must describe both.
come hang with me + @SalimansRobin next week to hear about using RL environments for evaluating + optimizing agents in prod at @zapier :) will be demoing some fun new features we've been collaborating on 👀
"New ideas needed" for AGI has been the headline on arcprize.org since June 2024
"scaling sucked out all the oxygen in the room, everyone converged to the same ideas" --> new ideas still needed!
"scaling sucked out all the oxygen in the room, everyone converged to the same ideas" --> new ideas still needed!
Ilya Sutskever: We are no longer in the age of scaling, we are back to the age of research
Really exciting to see! This is important work to assemble these scientific datasets. Also, one item close to my heart: > launch funding opportunities or prize competitions to incentivize private-sector participation in AI-driven scientific research
Very excited to see this AI for Science Executive Order—the Genesis Mission. The Administration has appropriately ambitious goals here; we may be on the verge of world-changing breakthroughs. Congratulations to all involved!
For those studying AI reasoning systems, Opus' token efficiency scaling curves on ARC v1 and v2 are worth looking at. Very clean-looking results. Raw data is here: huggingface.co/arcprize
Opus 4.5 (Thinking, 64k) on ARC-AGI Semi-Private Eval - ARC-AGI-1: 80.00%, $1.47/task - ARC-AGI-2: 37.64%, $2.40/task New SOTA for released frontier models from @AnthropicAI
Anthropic's new Claude 4.5 Opus (Thinking 64k) is on par with Gemini 3 Pro, released just 1 week ago! These are both very impressive new results. Intentional strategy? Timing coincidence? Or are there simply no secrets?
Opus 4.5 (Thinking, 64k) on ARC-AGI Semi-Private Eval - ARC-AGI-1: 80.00%, $1.47/task - ARC-AGI-2: 37.64%, $2.40/task New SOTA for released frontier models from @AnthropicAI
Grid size is definitely correlated. But so is solution program length (eg. kolmogorov complexity). This would be the basis of a good experiment -- disentangle these.
I'm live now chatting Gemini 3 ARC results!
Good morning. On today’s show: – @mikeknoop (Ndea) – @JonnyNemo (Sweetgreen) – @ashleevance (Core Memory) – @jeremy_epling (Vanta) – @keoneHD (Monad) – @stephenbalaban (Lambda) See you on stream.
We just verified Gemini 3 Pro and Deep Think (Preview) are over 2X SOTA on ARC v2! This is really impressive and frankly a bit surprising. Impressive because many of the v2 solves indicate clear complexity scaling over v1. Such as tasks 65b59efc, e3721c99, and dd6b8c4b We’re…
Gemini 3 models from @Google @GoogleDeepMind have made a significant 2X SOTA jump on ARC-AGI-2 (Semi-Private Eval) Gemini 3 Pro: 31.11%, $0.81/task Gemini 3 Deep Think (Preview): 45.14%, $77.16/task
One strange thing is despite significant inference cost reductions, ARC v1 pareto frontier continues to mostly hold up. You'd naively expect frontier AI to use cheap inference to get much more reasoning search coverage.
The rate reduction in price per unit of intelligence has been thing I've most consistently underestimated the past couple of years. 300x in a year is nuts!
To materially beat 2% GDP growth we need AI capable of innovation. Unlike automation, innovation inherently requires the ability to adapt to change. This is what ARC-AGI measures.
Joking aside, here's my base case for thinking about AI's impact on GDP growth. I think we'll keep growing at 2%. Stuff like AI that comes once in a while how we managed to grow at 2%/y as we've done for millennia since the first industrial revolution. There’s no sense in which…
Wow! Unprecedented movement on the leaderboard over the last few days. ARC Prize 2025 is now closed. I'm looking forward to reviewing all the papers (still a few more days to submit those). We'll announce final results on December 5.
One day left to submit to ARC Prize 2025 on Kaggle! Big changes at the top of the leaderboard these past few days, with the rise of teams NVARC and North Stars. Close contest between GiottoAI and ARChitects for the top spot. Keep in mind the final score will be evaluated on a…
ARC Prize 2025 - 1 day left for Top Score submissions The leaderboard is heating up, over 1.4K teams participating Guaranteed prizes: - Top Score ($50K) - Highest private-set scores, Nov 3 - Paper Prize ($75K) - Best conceptual progress, Nov 9 Grand Prize locked till 85%
At NeruIPS this year and interested in ARC? Come say hi to @fchollet and myself on Saturday night!
NeurIPS Party - ARC Prize Foundation + Y Combinator Join us in San Diego for a NeurIPS party co-hosted with @ycombinator Meet ARC Prize and YC leadership, researchers and industry leaders pushing the boundaries of frontier AI San Diego 6-8 PM, December 6, 2025
This is the final week for ARC Prize 2025! And the paper prize deadline is one week after close. Last year, there was a ton of action in the final days. Good luck to all teams!
ARC Prize 2025 - 6 days go to Over 1.3K teams have submitted 13.9K entries Guaranteed prizes: - Paper Prize ($75K) - Awarded to the best conceptual progress - Top Score ($50K) - Awarded to the submissions with the highest private-set scores Winners announced Dec 5
.@fchollet + @mikeknoop fireside chat @ MIT Listen to ARC Prize Co-Foundres, Francois Chollet + Mike Knoop talk about ARC-AGI-3, game development, and measuring intelligence with Interactive Benchmarks youtu.be/1u2DkoqEfhk
United States 趨勢
- 1. #StrangerThings5 110K posts
- 2. Thanksgiving 621K posts
- 3. Afghan 242K posts
- 4. Reed Sheppard 1,431 posts
- 5. National Guard 612K posts
- 6. robin 62.8K posts
- 7. Gonzaga 7,750 posts
- 8. holly 45.8K posts
- 9. #AEWDynamite 20.6K posts
- 10. Michigan 75.7K posts
- 11. Dustin 84.2K posts
- 12. Tini 7,361 posts
- 13. Rahmanullah Lakanwal 91.9K posts
- 14. #Survivor49 2,903 posts
- 15. Erica 11.8K posts
- 16. Will Richard 1,980 posts
- 17. #GoAvsGo 1,283 posts
- 18. Jardine 4,930 posts
- 19. Amen Thompson N/A
- 20. Dusty May N/A
你可能會喜歡
-
Wade Foster
@wadefoster -
Jessica Livingston
@jesslivingston -
Suhail
@Suhail -
Zapier
@zapier -
rahulvohra
@rahulvohra -
Tomasz Tunguz
@ttunguz -
Vinay Hiremath
@vhmth -
Brianne Kimmel
@briannekimmel -
Bryan Helmig 🍻
@bryanhelmig -
Harj Taggar
@harjtaggar -
Michael Karnjanaprakorn
@mikekarnj -
Zak Kukoff
@zck -
brryant
@bryantchou -
Ilya Fushman
@ilyaf -
Andreas Klinger 🦾
@andreasklinger
Something went wrong.
Something went wrong.