Fabian Schaipp

@FSchaipp

working on optimization for machine learning. currently postdoc @inria_paris.

Paris, France

fabian-sp.github.io

Joined July 2020

471Posts 1KFollowers 706Following

You might like

@konstmish

@prof_grimmer

@aaron_defazio

@Sierra_ML_Lab

@Mat_Dag

@damekdavis

@FranckIutzeler

@TillRichter6

@NicLoizou

@Dirque_L

@gowerrobert

@TaylorAdrien

@sam_hrvth

@Qu3ntinB

@geoffnegiar

Pinned

Fabian Schaipp

@FSchaipp

Feb 5

Learning rate schedules seem mysterious? Turns out that their behaviour can be described with a bound from *convex, nonsmooth* optimization. Short thread on our latest paper 🚇 arxiv.org/abs/2501.18965

Aaron Defazio

@aaron_defazio

Feb 3

The sudden loss drop when annealing the learning rate at the end of a WSD (warmup-stable-decay) schedule can be explained without relying on non-convexity or even smoothness, a new paper shows that it can be precisely predicted by theory in the convex, non-smooth setting! 1/2

aaron_defazio's tweet image. The sudden loss drop when annealing the learning rate at the end of a WSD (warmup-stable-decay) schedule can be explained without relying on non-convexity or even smoothness, a new paper shows that it can be precisely predicted by theory in the convex, non-smooth setting!
1/2

Fabian Schaipp

@FSchaipp

Nov 10

love to see how well MoMo works combined with Muon

Robert M. Gower 🇺🇦

@gowerrobert

Nov 6

We've just finished some work on improving the sensitivity of Muon to the learning rate, and exploring a lot of design choices. If you want to see how we did this, follow me ....1/x (Work lead by the amazing @CrichaelMawshaw)

gowerrobert's tweet image. We've just finished some work on improving the sensitivity of Muon to the learning rate, and exploring a lot of design choices. If you want to see how we did this, follow me ....1/x (Work lead by the amazing @CrichaelMawshaw)

Fabian Schaipp reposted

Andrei Semenov

@AndreiSemenov17

Oct 23

Good to see SOAP and Muon being quite performant in another setting — training of Diffusion Models. Similarly to our benchmark, the authors find Prodigy a decent “proxy-optimizer” for tuning hyperparams of Adam-like methods arxiv.org/pdf/2510.19376

AndreiSemenov17's tweet image. Good to see SOAP and Muon being quite performant in another setting — training of Diffusion Models. Similarly to our benchmark, the authors find Prodigy a decent “proxy-optimizer” for tuning hyperparams of Adam-like methods
arxiv.org/pdf/2510.19376

Fabian Schaipp

@FSchaipp

Oct 9

TIL: Even the Sophia authors couldn't reproduce the Sophia paper's results. source: arxiv.org/pdf/2509.02046

Fabian Schaipp

@FSchaipp

Sep 30

most stylish theatre i've been to. don't miss the coffee bar in the break.

lusso

@luusssso

Oct 27, 2024

The amazing lobby of the Teatro Regio in Turin, Italy by architect Carlo Mollino

Fabian Schaipp

@FSchaipp

Sep 30

when the paper title is a question, you can usually guess the "answer"

You Jiacheng

@YouJiacheng

Sep 30

It's only Monday

Fabian Schaipp

@FSchaipp

Sep 29

weight decay seems to be a hot topic of this year's ICLR submissions 👀

Fabian Schaipp

@FSchaipp

Sep 26

are models getting nervous when they are set from .train() to .eval()?

Fabian Schaipp reposted

Pratyush Maini

@pratyushmaini

Sep 25

If you’re scrambling a last-minute submission with an uncertain result, remember: putting it off is hard in the moment. It will sting for 10 minutes (because you care so deeply), but in 10 months you’ll be incredibly proud you made the scientifically rigorous call.

Fabian Schaipp

@FSchaipp

Sep 10

what a beauty

Fabian Schaipp reposted

Ingwar Perowanowitsch

@Perowinger94

Sep 8

Die Berliner Verkehrssenatorin Uta Bonde (CDU) jetzt im Gespräch mit dem Tagesspiegel zum Thema Schulwegsicherheit: „Wir können nicht nach Gutdünken Tempo 30 einführen“ Autofreie Schulstraßen wie in Paris sieht sie skeptisch. Dafür ihr Rat an alle Kinder und Jugendlichen: „Helm…

Perowinger94's tweet image. Die Berliner Verkehrssenatorin Uta Bonde (CDU) jetzt im Gespräch mit dem Tagesspiegel zum Thema Schulwegsicherheit: „Wir können nicht nach Gutdünken Tempo 30 einführen“

Autofreie Schulstraßen wie in Paris sieht sie skeptisch. Dafür ihr Rat an alle Kinder und Jugendlichen:

„Helm…

Fabian Schaipp reposted

Alex Hägele

@haeggee

Sep 2

Long in the making, finally released: Apertus-8B and Apertus-70B, trained on 15T tokens of open data from over 1800 languages. Unique opportunity in academia to work on and train LLMs across the full-stack. We managed to pull off a pretraining run with some fun innovations, ...

haeggee's tweet image. Long in the making, finally released: Apertus-8B and Apertus-70B, trained on 15T tokens of open data from over 1800 languages. Unique opportunity in academia to work on and train LLMs across the full-stack. We managed to pull off a pretraining run with some fun innovations, ...

CSCS Lugano

@cscsch

Sep 2

@EPFL , @ETH_en and #CSCS today released Apertus, Switzerland's first large-scale, multilingual language model (LLM). As a fully open LLM, it serves as a building block for developers and organizations to create their own applications: cscs.ch/science/comput… #Apertus #AI