FSchaipp's profile picture. working on optimization for machine learning. currently postdoc @inria_paris.

Fabian Schaipp

@FSchaipp

working on optimization for machine learning. currently postdoc @inria_paris.

Pinned

Learning rate schedules seem mysterious? Turns out that their behaviour can be described with a bound from *convex, nonsmooth* optimization. Short thread on our latest paper 🚇 arxiv.org/abs/2501.18965

The sudden loss drop when annealing the learning rate at the end of a WSD (warmup-stable-decay) schedule can be explained without relying on non-convexity or even smoothness, a new paper shows that it can be precisely predicted by theory in the convex, non-smooth setting! 1/2

aaron_defazio's tweet image. The sudden loss drop when annealing the learning rate at the end of a WSD (warmup-stable-decay) schedule can be explained without relying on non-convexity or even smoothness, a new paper shows that it can be precisely predicted by theory in the convex, non-smooth setting!
1/2


love to see how well MoMo works combined with Muon

We've just finished some work on improving the sensitivity of Muon to the learning rate, and exploring a lot of design choices. If you want to see how we did this, follow me ....1/x (Work lead by the amazing @CrichaelMawshaw)

gowerrobert's tweet image. We've just finished some work on improving the sensitivity of Muon to the learning rate, and exploring a lot of design choices. If you want to see how we did this, follow me ....1/x (Work lead by the amazing @CrichaelMawshaw)


Fabian Schaipp reposted

Good to see SOAP and Muon being quite performant in another setting — training of Diffusion Models. Similarly to our benchmark, the authors find Prodigy a decent “proxy-optimizer” for tuning hyperparams of Adam-like methods arxiv.org/pdf/2510.19376

AndreiSemenov17's tweet image. Good to see SOAP and Muon being quite performant in another setting — training of Diffusion Models. Similarly to our benchmark, the authors find Prodigy a decent “proxy-optimizer” for tuning hyperparams of Adam-like methods
arxiv.org/pdf/2510.19376

TIL: Even the Sophia authors couldn't reproduce the Sophia paper's results. source: arxiv.org/pdf/2509.02046

FSchaipp's tweet image. TIL: Even the Sophia authors couldn't reproduce the Sophia paper's results.

source: arxiv.org/pdf/2509.02046

most stylish theatre i've been to. don't miss the coffee bar in the break.

The amazing lobby of the Teatro Regio in Turin, Italy by architect Carlo Mollino

luusssso's tweet image. The amazing lobby of the Teatro Regio in Turin, Italy by architect Carlo Mollino
luusssso's tweet image. The amazing lobby of the Teatro Regio in Turin, Italy by architect Carlo Mollino
luusssso's tweet image. The amazing lobby of the Teatro Regio in Turin, Italy by architect Carlo Mollino
luusssso's tweet image. The amazing lobby of the Teatro Regio in Turin, Italy by architect Carlo Mollino


when the paper title is a question, you can usually guess the "answer"

It's only Monday

YouJiacheng's tweet image. It's only Monday


weight decay seems to be a hot topic of this year's ICLR submissions 👀


are models getting nervous when they are set from .train() to .eval()?


Fabian Schaipp reposted

If you’re scrambling a last-minute submission with an uncertain result, remember: putting it off is hard in the moment. It will sting for 10 minutes (because you care so deeply), but in 10 months you’ll be incredibly proud you made the scientifically rigorous call.


what a beauty

FSchaipp's tweet image. what a beauty

Fabian Schaipp reposted

Die Berliner Verkehrssenatorin Uta Bonde (CDU) jetzt im Gespräch mit dem Tagesspiegel zum Thema Schulwegsicherheit: „Wir können nicht nach Gutdünken Tempo 30 einführen“ Autofreie Schulstraßen wie in Paris sieht sie skeptisch. Dafür ihr Rat an alle Kinder und Jugendlichen: „Helm…

Perowinger94's tweet image. Die Berliner Verkehrssenatorin Uta Bonde (CDU) jetzt im Gespräch mit dem Tagesspiegel zum Thema Schulwegsicherheit: „Wir können nicht nach Gutdünken Tempo 30 einführen“

Autofreie Schulstraßen wie in Paris sieht sie skeptisch. Dafür ihr Rat an alle Kinder und Jugendlichen:

„Helm…
Perowinger94's tweet image. Die Berliner Verkehrssenatorin Uta Bonde (CDU) jetzt im Gespräch mit dem Tagesspiegel zum Thema Schulwegsicherheit: „Wir können nicht nach Gutdünken Tempo 30 einführen“

Autofreie Schulstraßen wie in Paris sieht sie skeptisch. Dafür ihr Rat an alle Kinder und Jugendlichen:

„Helm…
Perowinger94's tweet image. Die Berliner Verkehrssenatorin Uta Bonde (CDU) jetzt im Gespräch mit dem Tagesspiegel zum Thema Schulwegsicherheit: „Wir können nicht nach Gutdünken Tempo 30 einführen“

Autofreie Schulstraßen wie in Paris sieht sie skeptisch. Dafür ihr Rat an alle Kinder und Jugendlichen:

„Helm…
Perowinger94's tweet image. Die Berliner Verkehrssenatorin Uta Bonde (CDU) jetzt im Gespräch mit dem Tagesspiegel zum Thema Schulwegsicherheit: „Wir können nicht nach Gutdünken Tempo 30 einführen“

Autofreie Schulstraßen wie in Paris sieht sie skeptisch. Dafür ihr Rat an alle Kinder und Jugendlichen:

„Helm…

Fabian Schaipp reposted

Long in the making, finally released: Apertus-8B and Apertus-70B, trained on 15T tokens of open data from over 1800 languages. Unique opportunity in academia to work on and train LLMs across the full-stack. We managed to pull off a pretraining run with some fun innovations, ...

haeggee's tweet image. Long in the making, finally released: Apertus-8B and Apertus-70B, trained on 15T tokens of open data from over 1800 languages. Unique opportunity in academia to work on and train LLMs across the full-stack. We managed to pull off a pretraining run with some fun innovations, ...

@EPFL , @ETH_en and #CSCS today released Apertus, Switzerland's first large-scale, multilingual language model (LLM). As a fully open LLM, it serves as a building block for developers and organizations to create their own applications: cscs.ch/science/comput… #Apertus #AI

cscsch's tweet image. @EPFL , @ETH_en and #CSCS today released Apertus, Switzerland's first large-scale, multilingual language model (LLM). As a fully open LLM, it serves as a building block for developers and organizations to create their own applications: cscs.ch/science/comput… #Apertus #AI


Loading...

Something went wrong.


Something went wrong.