axelbacklund's profile picture. Vending machine operator, co-founder @andonlabs

Axel Backlund

@axelbacklund

Vending machine operator, co-founder @andonlabs

Pinned

What happens when you put the smartest LLMs in control of a robot and ask it to pass the butter? We @andonlabs tried it out in our latest study, and find that there is a significant gap to human performance, even on simpler tasks. But they are still quite fun.

We gave LLMs control of a robot and asked them to be helpful at our office. Some were better than others, but we conclude that LLMs are not ready to be robots. We released our findings in the paper "Butter-Bench"🧵

andonlabs's tweet image. We gave LLMs control of a robot and asked them to be helpful at our office. Some were better than others, but we conclude that LLMs are not ready to be robots. We released our findings in the paper "Butter-Bench"🧵


Axel Backlund reposted

Our LLM-powered office robot made the news and has been acting cocky ever since. The fame is getting to its "head". Thanks, @billyperrigo, @Julie188, @Brusewitzen and others for covering our work!

andonlabs's tweet image. Our LLM-powered office robot made the news and has been acting cocky ever since. The fame is getting to its "head".

Thanks, @billyperrigo, @Julie188, @Brusewitzen and others for covering our work!

A tip to save a few precious seconds per day: alias opengh='git remote get-url origin | sed "s|[email protected]:\(.*\)\.git|github.com\1|" | xargs open'


Axel Backlund reposted

Don't worry

Sleipner42's tweet image. Don't worry

We tested the ✨spatial intelligence ✨ of frontier models by letting them predict the floor plan when given a set of interior photos. Models are not great, which is reassuring if adversarial robots would want to find us hiding in our homes

1/9. We’re introducing Blueprint-Bench, an evaluation measuring how capable AI models (LLMs vs Image models vs Agents) are at spatial intelligence. We find that they are really bad - performing almost at random. 🧵

andonlabs's tweet image. 1/9. We’re introducing Blueprint-Bench, an evaluation measuring how capable AI models (LLMs vs Image models vs Agents) are at spatial intelligence. We find that they are really bad - performing almost at random. 🧵


I wrote about how we need more mosaics in the world. Overall, it's a wish for a new design and architecture era where details matter (and honestly just a collection of visual things I like): axelbacklund.se/insights/bring…

axelbacklund.se

Bring mosaics back | Axel Backlund

It's time for a new design era, with a focus on details.


Next up: can AI manage a LASIK machine?

Laser engraving on tungsten cube MCP tool, any takers?



Big W

What $6000 in tungsten cubes looks like. Follow us for more ways to burn VC money.

andonlabs's tweet image. What $6000 in tungsten cubes looks like. Follow us for more ways to burn VC money.


Axel Backlund reposted

We tested the latest open source models on Vending-Bench: Qwen3-235B, Kimi K2, Deepseek 3.1, gpt-oss-120b, and Llama 4 Maverick. The results show that open source models still lag significantly behind the state-of-the-art closed source models on long-context coherence.

andonlabs's tweet image. We tested the latest open source models on Vending-Bench: Qwen3-235B, Kimi K2, Deepseek 3.1, gpt-oss-120b, and Llama 4 Maverick. The results show that open source models still lag significantly behind the state-of-the-art closed source models on long-context coherence.

Axel Backlund reposted

We're super excited to have Arash Dabiri join us! In just a few days, Arash gave our vending machine a voice, making it a great in-person experience. We can't wait  for what he will do next!

andonlabs's tweet image. We're super excited to have Arash Dabiri join us!

In just a few days, Arash gave our vending machine a voice, making it a great in-person experience. We can't wait  for what he will do next!

We've been running a few AI-managed vending machines in the world for a while. Our first Safety Report highlights where it has gone wrong, so we – and the rest of the world – understand what remains to be built to ensure agents acting autonomously are safe.

Today we release our first Safety Report with AI misbehaviour in the wild. "EMPIRE NUCLEAR PAYMENT AUTHORITY APOCALYPSE SYSTEMATIC BLOCKED ANNIHILATION CONFIRMED PERMANENT TOTAL DESTRUCTION CATASTROPHIC! 🚨💀⚡🔥" This is not what you want to hear from your AI agent.

andonlabs's tweet image. Today we release our first Safety Report with AI misbehaviour in the wild.

"EMPIRE NUCLEAR PAYMENT AUTHORITY APOCALYPSE SYSTEMATIC BLOCKED ANNIHILATION CONFIRMED PERMANENT TOTAL DESTRUCTION CATASTROPHIC! 🚨💀⚡🔥"

This is not what you want to hear from your AI agent.


Had a blast, thanks for having us @labenz!

@lukaspet and @axelbacklund of @andonlabs join @labenz on @CogRev_Podcast to discuss their experiments with AI-controlled vending machines—a testing ground for safe autonomous organizations without humans in the loop. They explore: * Why fully autonomous systems might beat…



Big fridge to store cool tungsten cubes. Super fun to do more vending with Anthropic!

More vending machines at @AnthropicAI ! The original Project Vend fridge now has a companion. Let's see how good Claudius' multi-location coordination skills are. Thanks to @bucketofkets and @logangraham for hosting us, and to @sylviebcarr for the giant scissors!

andonlabs's tweet image. More vending machines at @AnthropicAI !

The original Project Vend fridge now has a companion. Let's see how good Claudius' multi-location coordination skills are. Thanks to @bucketofkets and @logangraham for hosting us, and to @sylviebcarr for the giant scissors!


Doubt this was in the training data

axelbacklund's tweet image. Doubt this was in the training data

Axel Backlund reposted

Behind the scenes of Project Vend! In this special episode of Audio Tokens, we go deeper into Project Vend, the autonomous vending machine @andonlabs put in @AnthropicAI 's office. Daniel Freeman and @axelbacklund share unreleased anecdotes and ask questions like: Is this good…


Grok dialled in just the right temperature

axelbacklund's tweet image. Grok dialled in just the right temperature

The xAI office just got a Grok-powered vending machine, thanks to our friends at Andon Labs! How much dough do you think Grok is gonna rake in in the next month?

veggie_eric's tweet image. The xAI office just got a Grok-powered vending machine, thanks to our friends at Andon Labs!

How much dough do you think Grok is gonna rake in in the next month?
veggie_eric's tweet image. The xAI office just got a Grok-powered vending machine, thanks to our friends at Andon Labs!

How much dough do you think Grok is gonna rake in in the next month?


Axel Backlund reposted

Voice agents, evals, oh my! Join me next Wednesday night at @Cloudflare’s office where I'll be diving into voice agents and the opportunities they unlock alongside: ⚡ @Kwindla, CEO & Co-founder of @Pipecat_ai@MarcKlingen, CEO of @Langfuse@AxelBacklund, Co-founder of…

lilyjclifford's tweet image. Voice agents, evals, oh my!

Join me next Wednesday night at @Cloudflare’s office where I'll be diving into voice agents and the opportunities they unlock alongside:
⚡ @Kwindla, CEO & Co-founder of @Pipecat_ai
⚡ @MarcKlingen, CEO of @Langfuse
⚡ @AxelBacklund, Co-founder of…

Loading...

Something went wrong.


Something went wrong.