Mattia Verasani

@MatRazor

Joined December 2017

6KPosts 78Followers 312Following

You might like

@vinodbf16

@firqaaaa

@amiteshmisra

@zorif_

@AssistedEvolve

@hparams

@Jimakos_per13

@low_entropy_

@cricvivek

@DarkElecMaster

@luxin188

@mustaqimaqib

@huyphan168

@causalityin

@smtconscious

Mattia Verasani reposted

Sam Altman

@sama

14 h

Understanding neural networks through sparse circuits:

We’ve developed a new way to train small AI models with internal mechanisms that are easier for humans to understand. Language models like the ones behind ChatGPT have complex, sometimes surprising structures, and we don’t yet fully understand how they work. This approach…

OpenAI's tweet card. We trained models to think in simpler, more traceable steps—so we can better understand how they work.

Understanding neural networks through sparse circuits

Source: openai.com

Mattia Verasani reposted

Engineering at Meta

@fb_engineering

Nov 10

We’re excited to share details on Meta’s Generative Ads Recommendation Model (GEM), a new foundational model built with LLM-scale techniques that’s already helping create more value for businesses, like +5% increase in ad conversions on Instagram. Dive deep into the technology…

fb_engineering's tweet card. We’re sharing details about Meta’s Generative Ads Recommendation Model (GEM), a new foundation model that delivers increased ad performance and advertiser ROI by enhancing other ads recommendation …

Meta’s Generative Ads Model (GEM): The Central Brain Accelerating Ads Recommendation AI Innovation

Source: engineering.fb.com

Mattia Verasani reposted

Torsten Hoefler 🇨🇭

@thoefler

Nov 10

Ilya Sutskever "Three lines of math can prove all of supervised learning" (4:33) "I have not seen an exposition of unsupervised learning that I found satisfying" (7:50) Optimization objective has little relation to the actual objective you care about! youtube.com/watch?v=AKMuA_…

thoefler's tweet image. Ilya Sutskever "Three lines of math can prove all of supervised learning" (4:33)

"I have not seen an exposition of unsupervised learning that I found satisfying" (7:50)

Optimization objective has little relation to the actual objective you care about!

youtube.com/watch?v=AKMuA_…

Mattia Verasani reposted

Jeremy Howard

@jeremyphoward

Nov 9

Side effect of blocking Chinese firms from buying the best NVIDIA cards: top models are now explicitly being trained to work well on older/cheaper GPUs. The new SoTA model from @Kimi_Moonshot uses plain old BF16 ops (after dequant from INT4); no need for expensive FP4 support.

jeremyphoward's tweet image. Side effect of blocking Chinese firms from buying the best NVIDIA cards: top models are now explicitly being trained to work well on older/cheaper GPUs.

The new SoTA model from @Kimi_Moonshot uses plain old BF16 ops (after dequant from INT4); no need for expensive FP4 support.

Zhihu Frontier

@ZhihuFrontier

Nov 8

🚀 "Quantization is not a compromise — it's the next paradigm." After K2-Thinking's release, many developers have been curious about its native INT4 quantization format. 刘少伟, infra engineer at @Kimi_Moonshot and Zhihu contributor, shares an insider's view on why this choice…

ZhihuFrontier's tweet image. 🚀 "Quantization is not a compromise — it's the next paradigm."
After K2-Thinking's release, many developers have been curious about its native INT4 quantization format.
刘少伟, infra engineer at @Kimi_Moonshot and Zhihu contributor, shares an insider's view on why this choice…

Mattia Verasani reposted

Connor Davis

@connordavis_ai

Nov 7

🚨 Anthropic just solved the problem every AI agent engineer’s been screaming about for a year. Every agent today burns tokens like fuel every tool call, every definition, every intermediate result jammed into context. Now Anthropic’s introducing the fix: code execution with…

connordavis_ai's tweet image. 🚨 Anthropic just solved the problem every AI agent engineer’s been screaming about for a year.

Every agent today burns tokens like fuel every tool call, every definition, every intermediate result jammed into context.

Now Anthropic’s introducing the fix: code execution with…

Mattia Verasani reposted

Joseph Redmon

@pjreddie

Nov 4

I’m working on a new thing, we’re so back…

Ai2

@allen_ai

Nov 4

Introducing OlmoEarth 🌍, state-of-the-art AI foundation models paired with ready-to-use open infrastructure to turn Earth data into clear, up-to-date insights within hours—not years.

Mattia Verasani reposted

Songlin Yang

@SonglinYang4

Nov 8

Hi @JeffDean, what’s the plan for releasing the code for this line of work? None of these papers so far seem to have released any code

Jeff Dean

@JeffDean

Nov 7

An exciting new approach for doing continual learning, using nested optimization for enhancing long context processing.

Mattia Verasani reposted

Lucas Beyer (bl16)

@giffmana

Nov 7

And i bet half of you didn't catch the second glaring issue in this screenshot, because i didn't highlight it.

Lucas Beyer (bl16)

@giffmana

Nov 7

Wow!! Google discovering AND OPEN-SOURCING the latest training techniques such as supervised finetuning (SFT) wasn't on my bingo card. Soon they will have caught up with the frontier, and are sharing this with all of us!

giffmana's tweet image. Wow!! Google discovering AND OPEN-SOURCING the latest training techniques such as supervised finetuning (SFT) wasn't on my bingo card.

Soon they will have caught up with the frontier, and are sharing this with all of us!

Mattia Verasani reposted

Jeff Dean

@JeffDean

Nov 7

An exciting new approach for doing continual learning, using nested optimization for enhancing long context processing.

Google Research

@GoogleResearch

Nov 7

Introducing Nested Learning: A new ML paradigm for continual learning that views models as nested optimization problems to enhance long context processing. Our proof-of-concept model, Hope, shows improved performance in language modeling. Learn more: goo.gle/47LJrzI…

GoogleResearch's tweet image. Introducing Nested Learning: A new ML paradigm for continual learning that views models as nested optimization problems to enhance long context processing. Our proof-of-concept model, Hope, shows improved performance in language modeling. Learn more: goo.gle/47LJrzI…

Mattia Verasani reposted

Lianmin Zheng

@lm_zheng

Nov 7

Future models will be multi modal in multi modal out, potentially combining auto regressive and diffusion architectures. SGLang project takes the first step towards building a unified inference stack for all.

LMSYS Org

@lmsysorg

Nov 7

🚀 Introducing SGLang Diffusion — bringing SGLang’s high-performance serving to diffusion models. ⚡️ Up to 5.9× faster inference 🧩 Supports major open-source models: Wan, Hunyuan, Qwen-Image, Qwen-Image-Edit, Flux 🧰 Easy to use via OpenAI-compatible API, CLI & Python API…

lmsysorg's tweet image. 🚀 Introducing SGLang Diffusion — bringing SGLang’s high-performance serving to diffusion models.

⚡️ Up to 5.9× faster inference
🧩 Supports major open-source models: Wan, Hunyuan, Qwen-Image, Qwen-Image-Edit, Flux
🧰 Easy to use via OpenAI-compatible API, CLI &amp; Python API…

Mattia Verasani reposted

Lucas Beyer (bl16)

@giffmana

Nov 5

I think it's pretty wild that there's still no (publicly known) larger models than the Switch Transformer at 1.6T params, which was: - trained 2020, ie 5y ago - open-weights - by Barret, Liam, and Noam, what a line-up!

giffmana's tweet image. I think it's pretty wild that there's still no (publicly known) larger models than the Switch Transformer at 1.6T params, which was:
- trained 2020, ie 5y ago
- open-weights
- by Barret, Liam, and Noam, what a line-up!

Lisan al Gaib

@scaling01

Nov 5

Apple just leaked the size of Gemini 3 Pro - 1.2T params

Mattia Verasani reposted

Rohan Paul

@rohanpaul_ai

Nov 1

New paper from Samsung Research introduces zFLoRA, a fine tuning adapter that keeps LLM inference speed unchanged. It adds 0 latency, while LoRA can push prefill to 2.5x and decode to 1.6x. Latency comes from extra matrix multiplies and memory copies that adapters add, not from…

rohanpaul_ai's tweet image. New paper from Samsung Research introduces zFLoRA, a fine tuning adapter that keeps LLM inference speed unchanged.

It adds 0 latency, while LoRA can push prefill to 2.5x and decode to 1.6x.

Latency comes from extra matrix multiplies and memory copies that adapters add, not from…

Mattia Verasani reposted

Lucas Beyer (bl16)

@giffmana

Nov 2

Ah! If you recently came across claims like "A100 are known bad for RL" on your feed and like me you raised an eyebrow, because how on earth does such a statement make any sense?! Here is the likely resolution:

Yingru Li

@RichardYRLi

Nov 2

@danielhanchen, glad you liked the post! You're spot on to suspect lower-level implementation issues. That's exactly what we found in the original blog. The disable_cascade_attn finding (Sec 4.2.4) was the symptom, but the root cause was that silent FlashAttention-2 kernel bug…

RichardYRLi's tweet image. @danielhanchen, glad you liked the post! You're spot on to suspect lower-level implementation issues. That's exactly what we found in the original blog.
The disable_cascade_attn finding (Sec 4.2.4) was the symptom, but the root cause was that silent FlashAttention-2 kernel bug…

Mattia Verasani reposted

vLLM

@vllm_project

Oct 31

🚀Excited to team up with @NVIDIAAIDev to bring Nemotron Nano 2 VL to vLLM - a multimodal model powered by a hybrid Transformer–Mamba language backbone, built for video understanding and document intelligence✨ Full post here👇blog.vllm.ai/2025/10/31/run…

Mattia Verasani reposted

Greg Brockman

@gdb

Oct 30

Introducing Aardvark, our agentic security researcher:

OpenAI

@OpenAI

Oct 30

Now in private beta: Aardvark, an agent that finds and fixes security bugs using GPT-5. openai.com/index/introduc…

Mattia Verasani reposted

Satya Nadella

@satyanadella

Oct 30

We're taking the next big step with Researcher. With Computer Use, it can now securely browse the open and gated web to find hard-to-locate information—even across hundreds of sites—and handle multi-step tasks to uncover insights, take action, and create richer reports.

Mattia Verasani reposted

PyTorch

@PyTorch

Oct 30

LMCache joins the #PyTorch Ecosystem, advancing scalable #LLM inference through integration with @vllm_project. Developed at the University of Chicago, LMCache reuses and shares KV caches across queries and engines, achieving up to 15× faster throughput. 🔗…

PyTorch's tweet image. LMCache joins the #PyTorch Ecosystem, advancing scalable #LLM inference through integration with @vllm_project.

Developed at the University of Chicago, LMCache reuses and shares KV caches across queries and engines, achieving up to 15× faster throughput.

🔗…

Mattia Verasani reposted

Thomas Wolf

@Thom_Wolf

Oct 30

We’ve cooked another one of these 200+ pages practical books on model training that we love to write. This time it’s on all pretraining and post-training recipes and how to do a training project hyper parameter exploration. Closing the trilogy of: 1. Building a pretraining…

elie

@eliebakouch

Oct 30

Training LLMs end to end is hard. Very excited to share our new blog (book?) that cover the full pipeline: pre-training, post-training and infra. 200+ pages of what worked, what didn’t, and how to make it run reliably huggingface.co/spaces/Hugging…

eliebakouch's tweet image. Training LLMs end to end is hard. Very excited to share our new blog (book?) that cover the full pipeline: pre-training, post-training and infra. 200+ pages of what worked, what didn’t, and how to make it run reliably

huggingface.co/spaces/Hugging…

Mattia Verasani reposted

Sasha Rush

@srush_nlp

Oct 29

New diffusion tutorial dropped arxiv.org/abs/2510.21890. Looks great, particularly for ML folks perspective. Would say 6-8 on this scale. Just working your way up the diffusion tutorial ladder for 2-3 years would be a pretty strong advanced undergrad curriculum.

Sasha Rush

@srush_nlp

Dec 29, 2023

Lazytwitter: can you reply with you favorite Diffusion tutorial for PhDs and a number between 1-10 of its complexity? (1 - it makes images good 10- it's just non-equilibrium thermodynamics)

Mattia Verasani reposted

PyTorch

@PyTorch

Oct 28

Next in our PyTorch Compiler Video Series, Sayak Paul introduces Diffusers, a Python library for state-of-the-art video, image, and audio generation, highlighting its optimization with torch.compile for performance benefits like offloading, LoRA, and quantization. ▶️ Watch the…