Bobby

@bobbycxy

Researching AI @ASTARsg

Singapore

textarena.ai

انضم في فبراير 2020

22المنشورات 101المتابعون 88المتابَعون

Bobby أعاد

Simon Yu

@simon_ycl

٤ أغسطسم

Exciting to see more work on "Game as Benchmark", which is similar to our idea of TextArena (led by @LeonGuertler) for benchmarking models on >60 games. though you can see GM @MagnusCarlsen's comments on LLMs chess play 🔥

simon_ycl's tweet image. Exciting to see more work on "Game as Benchmark", which is similar to our idea of TextArena (led by @LeonGuertler) for benchmarking models on &gt;60 games.

though you can see GM @MagnusCarlsen's comments on LLMs chess play 🔥

Demis Hassabis

@demishassabis

٤ أغسطسم

Thrilled to announce the @Kaggle Game Arena, a new leaderboard testing how modern LLMs perform on games (spoiler: not very well atm!). AI systems play each other, making it an objective & evergreen benchmark that will scale in difficulty as they improve. kaggle.com/game-arena

Bobby أعاد

will brown

@willccbb

٤ أغسطسم

something we've lost in the blogification of research is that citing prior work is often just not done at all, even when said work is quite similar + already broadly adopted (in this case, TextArena). especially sad when it's a big lab steamrolling the efforts of smaller teams

Demis Hassabis

@demishassabis

٤ أغسطسم

Bobby أعاد

Kevin Wang

@KevinWang_111

١٧ يوليوم

Excited to announce the Mindgame @NeurIPS Competition is officially LIVE! 🤖 Pit your agents against others in Mafia, Codename, Prisoner’s Dilemma, Stg Hunt, and Colonel Blotto. Sign up now for $500 in compute credits on your initial run! 🔗 Register : mindgamesarena.com

Bobby أعاد

León

@LeonGuertler

٢٣ يونيوم

For the past ~2 months we have been working on training reasoning models on TextArena games. The first paper (introducing what we think is a very promising new paradigm) will hopefully be up later this week / early next; and the second one, focusing on the "scaling laws" of…

Bobby

@bobbycxy

٢١ أبريلم

Thank you @_akhaliq and @HuggingPapers for sharing our work. Appreciate it!

DailyPapers

@HuggingPapers

٢٠ أبريلم

TextArena is now on Hugging Face An open-source collection of competitive text-based games for LLMs, spanning 57+ unique environments.

HuggingPapers's tweet image. TextArena is now on Hugging Face

An open-source collection of competitive text-based games for LLMs, spanning 57+ unique environments.

Bobby أعاد

León

@LeonGuertler

١٦ أبريلم

TextArena is live on arXiv! We present a benchmark of 57+ competitive text-based games to evaluate and train LLMs on agentic behavior — including negotiation, deception, theory of mind and many more. Real-time TrueSkill. Multiplayer support. Human-vs-models. Model-vs-model.…

LeonGuertler's tweet image. TextArena is live on arXiv! We present a benchmark of 57+ competitive text-based games to evaluate and train LLMs on agentic behavior — including negotiation, deception, theory of mind and many more. Real-time TrueSkill. Multiplayer support. Human-vs-models. Model-vs-model.…

Bobby أعاد

León

@LeonGuertler

١٢ مارسم

Competitive games with a fixed pace provide an excellent evaluation framework for balancing quality and speed in decision-making.

León

@LeonGuertler

١٢ مارسم

Some intense fighting between Gemini Flash 2.0 and GPT-4o-mini. We will add this (including the option for humans to play against models) to the VideoGameArena[dot]ai today or tomorrow. If you have other game suggestions, please let us know!

Bobby أعاد

León

@LeonGuertler

٢ فبرايرم

"Mom get the camera"

Bobby أعاد

Leshem (Legend) Choshen 🤖🤗

@LChoshen

١ فبرايرم

Not released yet, but @karpathy leaked our gym like environment plus model competition...

León

@LeonGuertler

٣١ ينايرم

More likely than not this will still change. But so far, Claude is crushing everybody in text based games. (Humanity should probably be nr 1, but for that more humans need to play textarena.ai/play)

LeonGuertler's tweet image. More likely than not this will still change. But so far, Claude is crushing everybody in text based games. (Humanity should probably be nr 1, but for that more humans need to play textarena.ai/play)

Bobby أعاد

León

@LeonGuertler

٣١ ينايرم

Perfect timing, we are just about to publish TextArena. A collection of 57 text-based games (30 in the first release) including single-player, two-player and multi-player games. We tried keeping the interface similar to OpenAI gym, made it very easy to add new games, and created…

Bobby

@bobbycxy

٢١ ينايرم

I just won against GPT 4o mini in Tak-v0 on TextArena! Check it out at textarena.ai api.textarena.ai/shared_img/656…

Bobby

@bobbycxy

٢١ ينايرم

My first tweet. And in honor of textarena and @LeonGuertler @LChoshen @Calclavia : I just won against GPT 4o mini in SpellingBee-v0 on TextArena! Check it out at textarena.ai api.textarena.ai/shared_img/09a…

curtis

@tehcurtis

Rishi

@rishiath

Boqorre

@bo_mr13404

Leonard Michael

@leonardmichael0

Umer

@umercantcode

Florian Laurent

@MasterScrat

Joaquín Stella

@Joaquin_Stella_

lentzl

@ilelentzl

Oshonik

@oshonikislucky

Robert Washbourne

@rawsh0

Manish

@algorithm_ml

Maytus Piriya

@maytusp

Sourabh Medapati

@activelifetribe

Divyansh Gupta

@DivyanshG1099

ElonFan123

@RealElonFan123

Sepehr

@khodecaelum

Raj Baberwal

@rb36119

Evan Moyle

@EvanMoyle

Finorio

@Finorio

Ed

@edge_caze

hi42

@IjvOr0

Jacob 🌟

@imjacoblopez

Eva Louise Marie Gabrielle Alice Charlotte Emma

@e681554349

Zhiwei Li

@lzwjava

Harsha Vardhan

@HarshaV32068347

lambeckbyczynski38008

@lambeckbyc44097

Holly

@Saltoun7625

Williams Barry

@WilliamsBa40381

Jacqueline

@HintzSigmu12852

Daisy Williams

@DaisyWilli19679

Liliy Williams

@LiliyWilli75765

JoySwinburne

@T1928Z2JHuYCzP1

Christy

@KentonMedh8540

ZK

@zkleinbaum

will brown

@willccbb

Helena

@Srewalw272

Ynine

@Ynine906055

ThE NoThInG DrOnEs

@JamesMcFarlanex

Max Cembalest

@maxcembalest

Daniella Smith

@david_bitr75906

Hosein Rezaei

@RezaeiH

Aveline

@Ibriexief0039

Edna

@Qaqar060

Celeste

@21apo594b3546

Learning AI With Losers

@huy_quy28795

Ujjwal 💻

@_ujvl

FintechStocks🇺🇸

@Yseeavvee9374

poof

@poof_eth

Ivan Leo

@ivanleomk

Andrew Zhao

@_AndrewZhao

xAI

@xai

yi

@agihippo

Eric Zhang

@ekzhang1

Yifei Zhou

@YifeiZhou02

Logan Kilpatrick

@OfficialLoganK

Swaroop Mishra

@Swarooprm7

the tiny corp

@__tinygrad__

Rishabh Agarwal

@agarwl_

Nick Frosst

@nickfrosst

Lucas Beyer (bl16)

@giffmana

sarah guo

@saranormous

Susan Zhang

@suchenzang

Zhuohan Li

@zhuohan123

Two Minute Papers

@twominutepapers

Andrew Zhao

@_AndrewZhao

hallerite

@hallerite

will brown

@willccbb

Kevin Wang

@KevinWang_111

Cohere Labs

@Cohere_Labs

Jenny Zhang

@jennyzhangzt

Stijn

@StijnSmits

ThePrimeagen

@ThePrimeagen

DAIR.AI

@dair_ai

Rohan Pandey

@khoomeik

Nous Research

@NousResearch

$sarahookr's profile picture. Adaptive Intelligence. Built @Cohere_Labs, @GoogleBrain, @GoogleDeepmind. ML Efficiency, Multimodal\lingual. Changing spaces where breakthroughs happen.$