m_olbap's profile picture. ML Engineer @HuggingFace. Previously ML R&D @ Rakuten. Computer vision and NLP mixer, ex-physicist. Dice thrower, dreamer, learner. He/him. Usually friendly :)

Pablo Montalvo

@m_olbap

ML Engineer @HuggingFace. Previously ML R&D @ Rakuten. Computer vision and NLP mixer, ex-physicist. Dice thrower, dreamer, learner. He/him. Usually friendly :)

Pablo Montalvo reposted

A few days ago, @thinkymachines released “LoRA Without Regret”, showing that LoRA can match full fine-tuning performance when configured right. Naturally, we decided to reproduce the results with TRL and release a guide

SergioPaniego's tweet image. A few days ago, @thinkymachines released “LoRA Without Regret”, showing that LoRA can match full fine-tuning performance when configured right.

Naturally, we decided to reproduce the results with TRL and release a guide

Pablo Montalvo reposted

You need to try this tool! 🫡 My colleague @m_olbap built an interactive HF Space to explore the modular support of open models in transformers over time 👀 You’ll spot things like 🦙 llama defining many models or which ones could be modular next


Pablo Montalvo reposted

Why is your KV so small? 🤏 In continuous batching, if you increase the max number of tokens per batch, you must decrease the memory allocated for your cache. In transformers, we make sure they are perfectly balanced (as all things should be). No matter how big your model is🦠🐋


Pablo Montalvo reposted

You have no idea what attention looks like 🤥 Many talk about attention like it's simple, but few know how it actually works. Even basic stuff like shapes and prefill / decode are not that easy to grasp. Good thing HF is cooking a blogpost to help you out 🫂

remi_or_'s tweet image. You have no idea what attention looks like 🤥

Many talk about attention like it's simple, but few know how it actually works. Even basic stuff like shapes and prefill / decode are not that easy to grasp.
Good thing HF is cooking a blogpost to help you out 🫂

Ever wondered how models actually see an image? Been playing with some visualizations of patch extraction, token layouts, how they affect predictions too. Planning a short visual deep dive comparing how different models process images. Would love thoughts before I go on.


Pablo Montalvo reposted

A quick update on the future of the `transformers` library! In order to provide a source of truth for all models, we are working with the rest of the ecosystem to make the modeling code the standard. A joint effort with vLLM, LlamaCPP, SGLang, Mlx, Qwen, Glm, Unsloth, Axoloth,…


Pablo Montalvo reposted

The Transformers library is undergoing it's largest pivot to date 🙌 It now cements its role as the central model definition, irrespective of the backend and runner. One ground truth to bring more reliability across the ecosystem. Why is this important?

LysandreJik's tweet image. The Transformers library is undergoing it's largest pivot to date 🙌

It now cements its role as the central model definition, irrespective of the backend and runner.

One ground truth to bring more reliability across the ecosystem.

Why is this important?

Loading...

Something went wrong.


Something went wrong.