Ricardo Dominguez-Olmedo

@rdolmedo_

PhD student at the Max Planck Institute for Intelligent Systems, working with Moritz Hardt and Bernhard Schölkopf.

Tübingen, Germany

ricardodominguez.github.io

Joined January 2014

100Posts 475Followers 303Following

You might like

@rpatrik96

@krikamol

@VStimper

@chandlersquires

@f_dangel

@sbordt

@Dingling_Yao

$amirhkarimi_'s profile picture. 🇮🇷 🇨🇦 👨🏻‍🏫 Asst Prof of ML @UWaterloo & Faculty Affiliate @VectorInst 🔎 Explainable AI, Human-AI Teams 🧠🤖 ex-{@DeepMind, @GoogleAI, @Meta} CHARM Lab👇$

@amirhkarimi_

Pinned

Ricardo Dominguez-Olmedo

@rdolmedo_

Dec 23

After giving all 3 model families the same amount of preparation prior to evaluation, Pythia performs just as well as Llama and Qwen. Pythia got everything right back in October 2022! Since then, improvements in performance largely come from 1) scale and 2) test task training.

rdolmedo_'s tweet image. After giving all 3 model families the same amount of preparation prior to evaluation, Pythia performs just as well as Llama and Qwen.

Pythia got everything right back in October 2022!

Since then, improvements in performance largely come from 1) scale and 2) test task training.

Ricardo Dominguez-Olmedo reposted

Prasanna Mayilvahanan

@prasannamayil

Oct 15

🚀 New Paper & Benchmark! Introducing MATH-Beyond (MATH-B), a new math reasoning benchmark deliberately constructed for common open-source models (≤8B) to fail at pass@1024! Paper: arxiv.org/abs/2510.11653 Dataset: huggingface.co/datasets/brend… 🧵1/10

prasannamayil's tweet image. 🚀 New Paper &amp; Benchmark!

Introducing MATH-Beyond (MATH-B), a new math reasoning benchmark deliberately constructed for common open-source models (≤8B) to fail at pass@1024!

Paper: arxiv.org/abs/2510.11653
Dataset: huggingface.co/datasets/brend…

🧵1/10

Ricardo Dominguez-Olmedo reposted

Yatong Chen

@YatongChen

Sep 22

We (Moritz Hardt, @walesalaudeen96,@joavanschoren) are organizing the Workshop on the Science of Benchmarking & Evaluating AI @EurIPSConf 2025 in Copenhagen! 📢 Call for Posters: rb.gy/kyid4f 📅 Deadline: Oct 10, 2025 (AoE) 🔗 More Info: rebrand.ly/bg931sf

YatongChen's tweet image. We (Moritz Hardt, @walesalaudeen96,@joavanschoren) are organizing the Workshop on the Science of Benchmarking &amp; Evaluating AI @EurIPSConf 2025 in Copenhagen!

📢 Call for Posters: rb.gy/kyid4f
📅 Deadline: Oct 10, 2025 (AoE)
🔗 More Info: rebrand.ly/bg931sf

Ricardo Dominguez-Olmedo

@rdolmedo_

May 1

My PhD advisor, Moritz Hardt, has just released the first half of his new book, The Emerging Science of Machine Learning Benchmarks. It’s freely available and highly recommended: mlbenchmarks.org

Ricardo Dominguez-Olmedo reposted

Prasanna Mayilvahanan

@prasannamayil

Feb 18

New preprint out! 🎉🎉 How does LLM training loss translate to downstream performance? We show that pretraining data and tokenizer shape loss-to-loss scaling laws, while architecture and other factors play a surprisingly minor role! brendel-group.github.io/llm-line/ 🧵1/8

prasannamayil's tweet image. New preprint out! 🎉🎉
How does LLM training loss translate to downstream performance?

We show that pretraining data and tokenizer shape loss-to-loss scaling laws, while architecture and other factors play a surprisingly minor role! brendel-group.github.io/llm-line/
🧵1/8

Ricardo Dominguez-Olmedo reposted

Ricardo Dominguez-Olmedo

@rdolmedo_

Feb 7

“Aha moments” can be observed at step 0, so we should not fixate on reporting individual instances. Instead, we should seek reliable measures of internal reasoning that can be tracked throughout training. So far, response length appears to be one such (imperfect) measure.

Ricardo Dominguez-Olmedo

@rdolmedo_

Feb 7

gist.github.com/RicardoDomingu…

rdolmedo_'s tweet card. GitHub Gist: instantly share code, notes, and snippets.

llama2_math.py

Source: gist.github.com

Ricardo Dominguez-Olmedo

@rdolmedo_

Feb 7

Self-reflection is not unique to “reasoning models” or to newer models. Here are some self-reflections produced by Llama 2 7B Chat, **WITHOUT** any RL fine-tuning.

rdolmedo_'s tweet image. Self-reflection is not unique to “reasoning models” or to newer models.

Here are some self-reflections produced by Llama 2 7B Chat, **WITHOUT** any RL fine-tuning.

Ricardo Dominguez-Olmedo

@rdolmedo_

Feb 3

gist.github.com/RicardoDomingu…

grpo_llama2-7b-chat_gsm8k.sh

Source: gist.github.com

Ricardo Dominguez-Olmedo

@rdolmedo_

Feb 3

Does reinforcement learning with verifiable rewards work only for recent model families? It turns out that GRPO also works very well for Llama 2 7B, with an impressive +15 accuracy point increase in GSM8K. GRPO over GSM8K train. No bells and whistles. It just works.

rdolmedo_'s tweet image. Does reinforcement learning with verifiable rewards work only for recent model families?

It turns out that GRPO also works very well for Llama 2 7B, with an impressive +15 accuracy point increase in GSM8K.

GRPO over GSM8K train. No bells and whistles. It just works.

Ricardo Dominguez-Olmedo

@rdolmedo_

Jan 27

One important caveat is that I cannot get the response length to dramatically increase as in the R1 paper.

Ricardo Dominguez-Olmedo

@rdolmedo_

Jan 26

R1-style GRPO on Llama 3.2 1B Instruct yields +10 accuracy points on GSM8K. It just works! The train data is GSM8K train. Interestingly, supervised fine-tuning yields no performance improvements, since the dataset is tiny compared to all the math reasoning data seen by Llama 3.

rdolmedo_'s tweet image. R1-style GRPO on Llama 3.2 1B Instruct yields +10 accuracy points on GSM8K. It just works!

The train data is GSM8K train. Interestingly, supervised fine-tuning yields no performance improvements, since the dataset is tiny compared to all the math reasoning data seen by Llama 3.

Ricardo Dominguez-Olmedo

@rdolmedo_

Jan 26

Ricardo Dominguez-Olmedo reposted

Julius Adebayo

@juliusadml

Jan 5

Really cool paper questioning all the 'incredible' progress we've seen recently: "after fine-tuning all models on the same amount of task data, performance per pre-training compute equalizes and newer models are no better than earlier models."

Ricardo Dominguez-Olmedo

@rdolmedo_

Dec 23

Models released after November 2023 strongly outperform earlier ones on MMLU and GSM8K. However, after fine-tuning all models on the same amount of task data, performance per pre-training compute equalizes and newer models are no better than earlier models.

rdolmedo_'s tweet image. Models released after November 2023 strongly outperform earlier ones on MMLU and GSM8K.

However, after fine-tuning all models on the same amount of task data, performance per pre-training compute equalizes and newer models are no better than earlier models.

Augustine Mavor-Parker

@MavorParker

願佳与

@ShenJiarun

Masoud Hadi

@MasoudHadi0

Nan Rosemary Ke

@rosemary_ke

shreyshahi

@shreyshahi

Arya Shree Shrestha

@AryaShreeS11795

John Yang

@jyangballin

Tom Sühr

@tom_suhr

Eric trump (OFFICIAL ACCOUNT)🇺🇸🇺🇸

@Eric7Trump1

logan koepke

@jlkoepke

Rohini Bhattacharjya

@Bhattacharjyaro

Nikita Agarwal

@niki__agarwal

Ofir Press

@OfirPress

Eric Stewart

@hippy4evrr

Albert Catalán Tatjer

@actatjer

Andy Timm

@Andy_Timm

Namdar Nejad

@NamdarNejad

Fabio.E Zola

@equant_org

🧫🧫

@Awreeorlic3693

unsafe {

@pyrodiscus

Alexander Panfilov

@kotekjedi_ml

FariborzSadeghi

@FariborzSaddeqi

Alexandru Fartade 🇺🇸🇮🇹🇷🇴

@KullAxel

Yixin Lin

@yixin_lin_

Matteo Farina

@farinamatteoo

Andreas Opedal

@OpedalAndreas

🧳🌍

@PaucekRoel13635

Yoram Bachrach

@yorambac

$sarahookr's profile picture. Adaptive Intelligence. Built @Cohere_Labs, @GoogleBrain, @GoogleDeepmind. ML Efficiency, Multimodal\lingual. Changing spaces where breakthroughs happen.$

Sara Hooker

@sarahookr

Camillus Edet

@CamillusEd31531

Yatong Chen

@YatongChen

sss ll

@sssll1065194

Ethan Elasky

@ethanelasky

Siddartha Devic

@sid_devic

Max Thiessen

@maxthiessen_ml

Sílvia Casacuberta

@SiCaPu

Pietro Sittoni

@CasselHank

Sahajpreet Singh

@Phy_Shro

Anna Bianchi

@nawiae87

Brando Miranda

@BrandoHablando

ineffable alias

@joorosa12185462

Konrad Szafer

@KonradSzafer

Rafael Martínez Galarza

@juramaga

hzhang0619

@hzhang0619

Lazy Titan

@lazytitan96

Chand

@ChandMoham71862

🤖🌱

@artificia1grass

MarthaZangwill

@2zIJR76D3E89L8

Eliya Habba @EMNLP 🇨🇳

@EliyaHabba

Aria Bagheri

@xsudoer

John Yang

@jyangballin

Tom Sühr

@tom_suhr

Ofir Press

@OfirPress

Jia Li

@JiaLi52524397

Alexander Panfilov

@kotekjedi_ml

Weiyang Liu

@Besteuler

Andreas Opedal

@OpedalAndreas

$sarahookr's profile picture. Adaptive Intelligence. Built @Cohere_Labs, @GoogleBrain, @GoogleDeepmind. ML Efficiency, Multimodal\lingual. Changing spaces where breakthroughs happen.$

Sara Hooker

@sarahookr

Sílvia Casacuberta

@SiCaPu

Brando Miranda

@BrandoHablando

Konrad Szafer

@KonradSzafer

Stella Biderman

@BlancheMinerva

Daniel van Strien

@vanstriendaniel

Shane Gu

@shaneguML

Dimitris Papailiopoulos

@DimitrisPapail

Boaz Barak

@boazbaraktcs

Ion Stoica

@istoica05

Michael Luo

@michaelzluo

Machel Reid

@machelreid

Rishabh Agarwal

@agarwl_

Jonas Geiping

@jonasgeiping

Shashwat Goel

@ShashwatGoel7

Ameya P.

@AmyPrb

George

@georgejrjrjr

Zichen Liu

@zzlccc

Vishaal Udandarao

@vishaal_urao

Seunghyun Seo

@SeunghyunSEO7

will brown

@willccbb

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

@teortaxesTex

Jordan Darefsky

@jordandarefsky

Alex Havrilla

@Dahoas1

Wenhu Chen

@WenhuChen

Ankesh Anand

@ankesh_anand

noahdgoodman

@noahdgoodman

André Cruz

@andcrz97

Zihan Wang - on RAGEN

@wzihanw

Nikhil Chandak

@nikhilchandak29

Yao Fu

@Francis_YAO_

Sam Park

@smsampark

Parth Thakkar

@parth007_96

$ninoscherrer's profile picture. Research Scientist at @Google | Rigorous evaluations, cognitive science & causality | Ex: {@PatronusAI, @VectorInst, @Mila_Quebec, @MPI_IS, @ETH_en}$