Mikel Bober-Irizar

@mikb0b

24 // Kaggle Competitions Grandmaster & ML/AI Researcher. Building video games @iconicgamesio, machine reasoning @Cambridge_CL, bioscience @ForecomAI.

London

mxbi.net

Iscritto a Agosto 2011

1KPost 8KFollower 1KFollowing

Potrebbero piacerti

@GuggerSylvain

@janexwang

@kagglingdieter

@bhutanisanyam1

@antgoldbloom

@ph_singer

@tejasdkulkarni

@lmthang

@Giba1

@A_K_Nain

@radekosmulski

@MeganRisdal

@ryan_chesler

@__MLT__

@syhw

Fissato

Mikel Bober-Irizar

@mikb0b

24 dic

Why do pre-o3 LLMs struggle with generalization tasks like @arcprize? It's not what you might think. OpenAI o3 shattered the ARC-AGI benchmark. But the hardest puzzles didn’t stump it because of reasoning, and this has implications for the benchmark as a whole. Analysis below🧵

mikb0b's tweet image. Why do pre-o3 LLMs struggle with generalization tasks like @arcprize? It's not what you might think.

OpenAI o3 shattered the ARC-AGI benchmark. But the hardest puzzles didn’t stump it because of reasoning, and this has implications for the benchmark as a whole.

Analysis below🧵

Mikel Bober-Irizar

@mikb0b

16 mar

Really good to be back in SF for GDC (yes, our game is still cooking 👀) If you're around and want to meet up next week, let me know!

Repost di Mikel Bober-Irizar

Greg Kamradt

@GregKamradt

25 dic

Seeing this chart go around a bunch, I think the main point is being missed - “LLMs can’t solve large grids because of perception” This is a deficiency in the model, there are alternative ways to “perceive” the grid. Doing it in 1-shot is not required. As a human, do you hold…

Mikel Bober-Irizar

@mikb0b

24 dic

LLMs are dramatically worse at ARC tasks the bigger they get. However, humans have no such issues - ARC task difficulty is independent of size. Most ARC tasks contain around 512-2048 pixels, and o3 is the first model capable of operating on these text grids reliably.

mikb0b's tweet image. LLMs are dramatically worse at ARC tasks the bigger they get. However, humans have no such issues - ARC task difficulty is independent of size.

Most ARC tasks contain around 512-2048 pixels, and o3 is the first model capable of operating on these text grids reliably.

Repost di Mikel Bober-Irizar

Simone Romeo

@simone_m_romeo

25 dic

I recommend reading @mikb0b 's article on o3's performance on the ARC challenge. He proves that LLMs' struggle with ARC depend on their inability to easily process large 2d grids.

Mikel Bober-Irizar

@mikb0b

25 dic

This is a really good observation! I wrote about it and analyzed why in this article: anokas.substack.com/p/llms-struggl…

Repost di Mikel Bober-Irizar

Olcan

@olcan

24 dic

more evidence (including experiments varying sizes of problems) that grid size alone plays a significant role in arc. this is obviously far from ideal for a reasoning benchmark and can hopefully get addressed in arc-agi-2

Mikel Bober-Irizar

@mikb0b

24 dic

Repost di Mikel Bober-Irizar

meg.ai 🇨🇦

@MeganRisdal

14 dic

Really great to meet and catch up with @mikb0b in person after many years! 😄

Mikel Bober-Irizar

@mikb0b

27 feb 2024

I'm heading back to San Francisco for @Official_GDC 🎮 - if anyone's around the bay area late March and wants to meet up let me know!

Mikel Bober-Irizar

@mikb0b

4 nov 2023

I'll be speaking at @NVIDIA's AI & DS Virtual Summit about the journey to becoming the youngest Kaggle Grandmaster, along with @Rob_Mulla and @kagglingdieter. 🔥 Come and join us for a live Q&A on Wednesday 9th at 12pm PT (for free!) nvidia.com/en-us/events/a… @NVIDIAAI

mikb0b's tweet image. I'll be speaking at @NVIDIA's AI &amp; DS Virtual Summit about the journey to becoming the youngest Kaggle Grandmaster, along with @Rob_Mulla and @kagglingdieter. 🔥

Come and join us for a live Q&amp;A on Wednesday 9th at 12pm PT (for free!) nvidia.com/en-us/events/a… @NVIDIAAI

Mikel Bober-Irizar

@mikb0b

20 ott 2023

I'm going to be in San Francisco in early November! ✈️ If anyone's in the bay area and wants to meet up, or if anyone knows any events I should check out, let me know! 😊

Mikel Bober-Irizar

@mikb0b

14 ott 2023

I've recently been playing with @fchollet's Abstraction and Reasoning Corpus, a really interesting benchmark for building systems that can reason. As part of that, I've just released a small 🐍 library for easily interacting with and visualising ARC: github.com/mxbi/arckit

mikb0b's tweet image. I've recently been playing with @fchollet's Abstraction and Reasoning Corpus, a really interesting benchmark for building systems that can reason.

As part of that, I've just released a small 🐍 library for easily interacting with and visualising ARC: github.com/mxbi/arckit

Mikel Bober-Irizar

@mikb0b

5 mag 2023

Really proud to be published in a Nature Portfolio journal for the first time! We set a new SOTA for single-cell protein localisation on the @ProteinAtlas, building on our work in the 2nd HPA Kaggle comp. nature.com/articles/s4200… @ForecomAI @cvssp_research @d_minskiy