MatRazor's profile picture.

Mattia Verasani

@MatRazor

Mattia Verasani reposted

Understanding neural networks through sparse circuits:

We’ve developed a new way to train small AI models with internal mechanisms that are easier for humans to understand. Language models like the ones behind ChatGPT have complex, sometimes surprising structures, and we don’t yet fully understand how they work. This approach…



Mattia Verasani reposted

We’re excited to share details on Meta’s Generative Ads Recommendation Model (GEM), a new foundational model built with LLM-scale techniques that’s already helping create more value for businesses, like +5% increase in ad conversions on Instagram. Dive deep into the technology…


Mattia Verasani reposted

Ilya Sutskever "Three lines of math can prove all of supervised learning" (4:33) "I have not seen an exposition of unsupervised learning that I found satisfying" (7:50) Optimization objective has little relation to the actual objective you care about! youtube.com/watch?v=AKMuA_…

thoefler's tweet image. Ilya Sutskever "Three lines of math can prove all of supervised learning" (4:33)

"I have not seen an exposition of unsupervised learning that I found satisfying" (7:50)

Optimization objective has little relation to the actual objective you care about!

youtube.com/watch?v=AKMuA_…
thoefler's tweet image. Ilya Sutskever "Three lines of math can prove all of supervised learning" (4:33)

"I have not seen an exposition of unsupervised learning that I found satisfying" (7:50)

Optimization objective has little relation to the actual objective you care about!

youtube.com/watch?v=AKMuA_…
thoefler's tweet image. Ilya Sutskever "Three lines of math can prove all of supervised learning" (4:33)

"I have not seen an exposition of unsupervised learning that I found satisfying" (7:50)

Optimization objective has little relation to the actual objective you care about!

youtube.com/watch?v=AKMuA_…
thoefler's tweet image. Ilya Sutskever "Three lines of math can prove all of supervised learning" (4:33)

"I have not seen an exposition of unsupervised learning that I found satisfying" (7:50)

Optimization objective has little relation to the actual objective you care about!

youtube.com/watch?v=AKMuA_…

Mattia Verasani reposted

Side effect of blocking Chinese firms from buying the best NVIDIA cards: top models are now explicitly being trained to work well on older/cheaper GPUs. The new SoTA model from @Kimi_Moonshot uses plain old BF16 ops (after dequant from INT4); no need for expensive FP4 support.

jeremyphoward's tweet image. Side effect of blocking Chinese firms from buying the best NVIDIA cards: top models are now explicitly being trained to work well on older/cheaper GPUs.

The new SoTA model from @Kimi_Moonshot uses plain old BF16 ops (after dequant from INT4); no need for expensive FP4 support.

🚀 "Quantization is not a compromise — it's the next paradigm." After K2-Thinking's release, many developers have been curious about its native INT4 quantization format. 刘少伟, infra engineer at @Kimi_Moonshot and Zhihu contributor, shares an insider's view on why this choice…

ZhihuFrontier's tweet image. 🚀 "Quantization is not a compromise — it's the next paradigm."
After K2-Thinking's release, many developers have been curious about its native INT4 quantization format.
刘少伟, infra engineer at @Kimi_Moonshot and Zhihu contributor, shares an insider's view on why this choice…
ZhihuFrontier's tweet image. 🚀 "Quantization is not a compromise — it's the next paradigm."
After K2-Thinking's release, many developers have been curious about its native INT4 quantization format.
刘少伟, infra engineer at @Kimi_Moonshot and Zhihu contributor, shares an insider's view on why this choice…
ZhihuFrontier's tweet image. 🚀 "Quantization is not a compromise — it's the next paradigm."
After K2-Thinking's release, many developers have been curious about its native INT4 quantization format.
刘少伟, infra engineer at @Kimi_Moonshot and Zhihu contributor, shares an insider's view on why this choice…
ZhihuFrontier's tweet image. 🚀 "Quantization is not a compromise — it's the next paradigm."
After K2-Thinking's release, many developers have been curious about its native INT4 quantization format.
刘少伟, infra engineer at @Kimi_Moonshot and Zhihu contributor, shares an insider's view on why this choice…


Mattia Verasani reposted

🚨 Anthropic just solved the problem every AI agent engineer’s been screaming about for a year. Every agent today burns tokens like fuel every tool call, every definition, every intermediate result jammed into context. Now Anthropic’s introducing the fix: code execution with…

connordavis_ai's tweet image. 🚨 Anthropic just solved the problem every AI agent engineer’s been screaming about for a year.

Every agent today burns tokens like fuel every tool call, every definition, every intermediate result jammed into context.

Now Anthropic’s introducing the fix: code execution with…

Mattia Verasani reposted

I’m working on a new thing, we’re so back…

Introducing OlmoEarth 🌍, state-of-the-art AI foundation models paired with ready-to-use open infrastructure to turn Earth data into clear, up-to-date insights within hours—not years.



Mattia Verasani reposted

Hi @JeffDean, what’s the plan for releasing the code for this line of work? None of these papers so far seem to have released any code

SonglinYang4's tweet image. Hi @JeffDean, what’s the plan for releasing the code for this line of work? None of these papers so far seem to have released any code

An exciting new approach for doing continual learning, using nested optimization for enhancing long context processing.



Mattia Verasani reposted

And i bet half of you didn't catch the second glaring issue in this screenshot, because i didn't highlight it.

Wow!! Google discovering AND OPEN-SOURCING the latest training techniques such as supervised finetuning (SFT) wasn't on my bingo card. Soon they will have caught up with the frontier, and are sharing this with all of us!

giffmana's tweet image. Wow!! Google discovering AND OPEN-SOURCING the latest training techniques such as supervised finetuning (SFT) wasn't on my bingo card.

Soon they will have caught up with the frontier, and are sharing this with all of us!


Mattia Verasani reposted

An exciting new approach for doing continual learning, using nested optimization for enhancing long context processing.

Introducing Nested Learning: A new ML paradigm for continual learning that views models as nested optimization problems to enhance long context processing. Our proof-of-concept model, Hope, shows improved performance in language modeling. Learn more: goo.gle/47LJrzI

GoogleResearch's tweet image. Introducing Nested Learning: A new ML paradigm for continual learning that views models as nested optimization problems to enhance long context processing. Our proof-of-concept model, Hope, shows improved performance in language modeling. Learn more: goo.gle/47LJrzI…


Mattia Verasani reposted

Future models will be multi modal in multi modal out, potentially combining auto regressive and diffusion architectures. SGLang project takes the first step towards building a unified inference stack for all.

🚀 Introducing SGLang Diffusion — bringing SGLang’s high-performance serving to diffusion models. ⚡️ Up to 5.9× faster inference 🧩 Supports major open-source models: Wan, Hunyuan, Qwen-Image, Qwen-Image-Edit, Flux 🧰 Easy to use via OpenAI-compatible API, CLI & Python API…

lmsysorg's tweet image. 🚀 Introducing SGLang Diffusion — bringing SGLang’s high-performance serving to diffusion models.

⚡️ Up to 5.9× faster inference
🧩 Supports major open-source models: Wan, Hunyuan, Qwen-Image, Qwen-Image-Edit, Flux
🧰 Easy to use via OpenAI-compatible API, CLI & Python API…
lmsysorg's tweet image. 🚀 Introducing SGLang Diffusion — bringing SGLang’s high-performance serving to diffusion models.

⚡️ Up to 5.9× faster inference
🧩 Supports major open-source models: Wan, Hunyuan, Qwen-Image, Qwen-Image-Edit, Flux
🧰 Easy to use via OpenAI-compatible API, CLI & Python API…
lmsysorg's tweet image. 🚀 Introducing SGLang Diffusion — bringing SGLang’s high-performance serving to diffusion models.

⚡️ Up to 5.9× faster inference
🧩 Supports major open-source models: Wan, Hunyuan, Qwen-Image, Qwen-Image-Edit, Flux
🧰 Easy to use via OpenAI-compatible API, CLI & Python API…
lmsysorg's tweet image. 🚀 Introducing SGLang Diffusion — bringing SGLang’s high-performance serving to diffusion models.

⚡️ Up to 5.9× faster inference
🧩 Supports major open-source models: Wan, Hunyuan, Qwen-Image, Qwen-Image-Edit, Flux
🧰 Easy to use via OpenAI-compatible API, CLI & Python API…


Mattia Verasani reposted

I think it's pretty wild that there's still no (publicly known) larger models than the Switch Transformer at 1.6T params, which was: - trained 2020, ie 5y ago - open-weights - by Barret, Liam, and Noam, what a line-up!

giffmana's tweet image. I think it's pretty wild that there's still no (publicly known) larger models than the Switch Transformer at 1.6T params, which was:
- trained 2020, ie 5y ago
- open-weights
- by Barret, Liam, and Noam, what a line-up!
giffmana's tweet image. I think it's pretty wild that there's still no (publicly known) larger models than the Switch Transformer at 1.6T params, which was:
- trained 2020, ie 5y ago
- open-weights
- by Barret, Liam, and Noam, what a line-up!

Apple just leaked the size of Gemini 3 Pro - 1.2T params

scaling01's tweet image. Apple just leaked the size of Gemini 3 Pro - 1.2T params


Mattia Verasani reposted

New paper from Samsung Research introduces zFLoRA, a fine tuning adapter that keeps LLM inference speed unchanged. It adds 0 latency, while LoRA can push prefill to 2.5x and decode to 1.6x. Latency comes from extra matrix multiplies and memory copies that adapters add, not from…

rohanpaul_ai's tweet image. New paper from Samsung Research introduces zFLoRA, a fine tuning adapter that keeps LLM inference speed unchanged.

It adds 0 latency, while LoRA can push prefill to 2.5x and decode to 1.6x.

Latency comes from extra matrix multiplies and memory copies that adapters add, not from…

Mattia Verasani reposted

Ah! If you recently came across claims like "A100 are known bad for RL" on your feed and like me you raised an eyebrow, because how on earth does such a statement make any sense?! Here is the likely resolution:

@danielhanchen, glad you liked the post! You're spot on to suspect lower-level implementation issues. That's exactly what we found in the original blog. The disable_cascade_attn finding (Sec 4.2.4) was the symptom, but the root cause was that silent FlashAttention-2 kernel bug…

RichardYRLi's tweet image. @danielhanchen, glad you liked the post! You're spot on to suspect lower-level implementation issues. That's exactly what we found in the original blog. 
The disable_cascade_attn finding (Sec 4.2.4) was the symptom, but the root cause was that silent FlashAttention-2 kernel bug…


Mattia Verasani reposted

🚀Excited to team up with @NVIDIAAIDev to bring Nemotron Nano 2 VL to vLLM - a multimodal model powered by a hybrid Transformer–Mamba language backbone, built for video understanding and document intelligence✨ Full post here👇blog.vllm.ai/2025/10/31/run…


Mattia Verasani reposted

Introducing Aardvark, our agentic security researcher:

Now in private beta: Aardvark, an agent that finds and fixes security bugs using GPT-5. openai.com/index/introduc…

OpenAI's tweet image. Now in private beta: Aardvark, an agent that finds and fixes security bugs using GPT-5.

openai.com/index/introduc…


Mattia Verasani reposted

We're taking the next big step with Researcher. With Computer Use, it can now securely browse the open and gated web to find hard-to-locate information—even across hundreds of sites—and handle multi-step tasks to uncover insights, take action, and create richer reports.


Mattia Verasani reposted

LMCache joins the #PyTorch Ecosystem, advancing scalable #LLM inference through integration with @vllm_project. Developed at the University of Chicago, LMCache reuses and shares KV caches across queries and engines, achieving up to 15× faster throughput. 🔗…

PyTorch's tweet image. LMCache joins the #PyTorch Ecosystem, advancing scalable #LLM inference through integration with @vllm_project.

Developed at the University of Chicago, LMCache reuses and shares KV caches across queries and engines, achieving up to 15× faster throughput.

🔗…

Mattia Verasani reposted

We’ve cooked another one of these 200+ pages practical books on model training that we love to write. This time it’s on all pretraining and post-training recipes and how to do a training project hyper parameter exploration. Closing the trilogy of: 1. Building a pretraining…

Training LLMs end to end is hard. Very excited to share our new blog (book?) that cover the full pipeline: pre-training, post-training and infra. 200+ pages of what worked, what didn’t, and how to make it run reliably huggingface.co/spaces/Hugging…

eliebakouch's tweet image. Training LLMs end to end is hard. Very excited to share our new blog (book?) that cover the full pipeline: pre-training, post-training and infra. 200+ pages of what worked, what didn’t, and how to make it run reliably

huggingface.co/spaces/Hugging…


Mattia Verasani reposted

New diffusion tutorial dropped arxiv.org/abs/2510.21890. Looks great, particularly for ML folks perspective. Would say 6-8 on this scale. Just working your way up the diffusion tutorial ladder for 2-3 years would be a pretty strong advanced undergrad curriculum.

Lazytwitter: can you reply with you favorite Diffusion tutorial for PhDs and a number between 1-10 of its complexity? (1 - it makes images good 10- it's just non-equilibrium thermodynamics)



Mattia Verasani reposted

Next in our PyTorch Compiler Video Series, Sayak Paul introduces Diffusers, a Python library for state-of-the-art video, image, and audio generation, highlighting its optimization with torch.compile for performance benefits like offloading, LoRA, and quantization. ▶️ Watch the…


Loading...

Something went wrong.


Something went wrong.