Diogo Fernandes

@dioogfernands

machine learning enthusiast

Joined April 2021

5KPosts 49Followers 171Following

You might like

@MetaverseAIGC

@firqaaaa

@ismailTG3

@paulshipfast

@KrunoLehman

@traintest_split

@CassielYM

@the_perceptron

@AtenaGMohammadi

@bharadwajymg

@MohitNihalani9

@devjumzzz

@thekeeper33

Diogo Fernandes reposted

EXIT PINPONG

@EXITudio

18 h

Our paper "MaskControl: Spatio-Temporal Control for Masked Motion Synthesis" has been selected as an 🏆 Award Candidate at ICCV 2025! 🌺✨ It’s a huge honor to see our work recognized among the top papers this year.

Diogo Fernandes reposted

Tom Dörr

@tom_doerr

13 h

framework for domain-specific knowledge extraction and reasoning with LLMs

Diogo Fernandes reposted

snimu

@omouamoua

Oct 10

New blog post:

Diogo Fernandes reposted

Rohan Paul

@rohanpaul_ai

18 h

New APPLE paper says a small base model plus fetched memories can act like a bigger one. With about 10% extra fetched parameters, a 160M model matches models over 2x its size. Packing all facts into fixed weights wastes memory and compute because each query needs very little.…

rohanpaul_ai's tweet image. New APPLE paper says a small base model plus fetched memories can act like a bigger one.

With about 10% extra fetched parameters, a 160M model matches models over 2x its size.

Packing all facts into fixed weights wastes memory and compute because each query needs very little.…

Diogo Fernandes reposted

Tom Dörr

@tom_doerr

Oct 11

Import 3D models, auto-rig and export animations in your browser

Diogo Fernandes reposted

Rohan Paul

@rohanpaul_ai

Oct 5

The paper links Kolmogorov complexity to Transformers and proposes loss functions that become provably best as model resources grow. It treats learning as compression, minimize bits to describe the model plus bits to describe the labels. Provides a single training target that…

rohanpaul_ai's tweet image. The paper links Kolmogorov complexity to Transformers and proposes loss functions that become provably best as model resources grow.

It treats learning as compression, minimize bits to describe the model plus bits to describe the labels.

Provides a single training target that…

Diogo Fernandes reposted

JundeWu

@JundeMorsenWu

Oct 11

Segment Anything 3 just silently dropped on ICLR 🤯 The first SAM let you click on an object to segment it. SAM 2 added video and memory. Now SAM 3 says: just describe what you want — “yellow school bus”, “striped cat”, “red apple” — and it will find and segment every instance…

JundeMorsenWu's tweet image. Segment Anything 3 just silently dropped on ICLR 🤯

The first SAM let you click on an object to segment it.
SAM 2 added video and memory.
Now SAM 3 says: just describe what you want — “yellow school bus”, “striped cat”, “red apple” — and it will find and segment every instance…

Diogo Fernandes reposted

Sebastian Raschka

@rasbt

Oct 11

Just a bit of weekend coding fun: A memory estimator to calculate the savings when using grouped-query attention vs multi-head attention (+ code implementations of course). 🔗 github.com/rasbt/LLMs-fro… Will add this for multi-head latent, sliding, and sparse attention as well.

rasbt's tweet image. Just a bit of weekend coding fun: A memory estimator to calculate the savings when using grouped-query attention vs multi-head attention (+ code implementations of course).

🔗 github.com/rasbt/LLMs-fro…

Will add this for multi-head latent, sliding, and sparse attention as well.

Diogo Fernandes reposted

Tingting Liao

@tingtin36139994

Oct 8

🎬 Introducing: Character Mixing for Video Generation Imagine Mr. Bean stepping into Tom & Jerry's world 🐭✨ Now it's possible! ✨ Our framework first enables natural cross-character interactions in text-to-video generation while preserving identity and style fidelity.

Diogo Fernandes reposted

DailyPapers

@HuggingPapers

Oct 8

Samsung's Tiny Recursive Model (TRM) masters complex reasoning With just 7M parameters, TRM outperforms large LLMs on hard puzzles like Sudoku & ARC-AGI. This "Less is More" approach redefines efficiency in AI, using less than 0.01% of competitors' parameters!

HuggingPapers's tweet image. Samsung's Tiny Recursive Model (TRM) masters complex reasoning

With just 7M parameters, TRM outperforms large LLMs on hard puzzles like Sudoku &amp; ARC-AGI. This "Less is More" approach redefines efficiency in AI, using less than 0.01% of competitors' parameters!

Diogo Fernandes reposted

Jiacheng Liu

@liujc1998

Oct 8

Ever wondered what CAN'T be transformed by Transformers? 🪨 I wrote a fun blog post on finding "fixed points" of your LLMs. If you prompt it with a fixed point token, the LLM is gonna decode it repeatedly forever, guaranteed. There's some connection with LLMs' repetition issue.

liujc1998's tweet image. Ever wondered what CAN'T be transformed by Transformers? 🪨

I wrote a fun blog post on finding "fixed points" of your LLMs. If you prompt it with a fixed point token, the LLM is gonna decode it repeatedly forever, guaranteed.

There's some connection with LLMs' repetition issue.

Diogo Fernandes reposted

Sebastian Raschka

@rasbt

Oct 8

From the Hierarchical Reasoning Model (HRM) to a new Tiny Recursive Model (TRM). A few months ago, the HRM made big waves in the AI research community as it showed really good performance on the ARC challenge despite its small 27M size. (That's about 22x smaller than the…

rasbt's tweet image. From the Hierarchical Reasoning Model (HRM) to a new Tiny Recursive Model (TRM).

A few months ago, the HRM made big waves in the AI research community as it showed really good performance on the ARC challenge despite its small 27M size. (That's about 22x smaller than the…

Diogo Fernandes reposted

Yilun Du

@du_yilun

Oct 6

Excited to share Equilibrium Matching (EqM)! EqM simplifies and outperforms flow matching, enabling strong generative performance of FID 1.96 on ImageNet 256x256. EqM learns a single static EBM landscape for generation, enabling a simple gradient-based generation procedure.

Diogo Fernandes reposted

Biology+AI Daily

@BiologyAIDaily

Oct 4

Flow Autoencoders Are Effective Protein Tokenizers 1. The article introduces Kanzi, a novel flow-based tokenizer for protein structures that simplifies the training process and achieves state-of-the-art performance in protein structure tokenization and generation. 2. Kanzi uses…

BiologyAIDaily's tweet image. Flow Autoencoders Are Effective Protein Tokenizers

1. The article introduces Kanzi, a novel flow-based tokenizer for protein structures that simplifies the training process and achieves state-of-the-art performance in protein structure tokenization and generation.

2. Kanzi uses…

Diogo Fernandes reposted

机器之心 JIQIZHIXIN

@jiqizhixin

Oct 5

An intriguing paper from Apple. MoEs Are Stronger than You Think: Hyper-Parallel Inference Scaling with RoE Paper: arxiv.org/abs/2509.17238

jiqizhixin's tweet image. An intriguing paper from Apple.

MoEs Are Stronger than You Think: Hyper-Parallel Inference Scaling with RoE

Paper: arxiv.org/abs/2509.17238

Diogo Fernandes reposted

Rohan Paul

@rohanpaul_ai

Oct 4

⏱️ New Time series research from @GoogleResearch Shows a new approach to time-series forecasting that uses continued pre-training to teach a time-series foundation model to adapt to in-context examples at inference time. The big deal is that time series forecasting finally…

rohanpaul_ai's tweet image. ⏱️ New Time series research from @GoogleResearch

Shows a new approach to time-series forecasting that uses continued pre-training to teach a time-series foundation model to adapt to in-context examples at inference time.

The big deal is that time series forecasting finally…

Diogo Fernandes reposted

Rohan Paul

@rohanpaul_ai

Oct 4

Google research publishes on a better way for AI health assistant. Wayfinding AI shows that asking a few targeted questions first produces more helpful and more tailored health info than a one-shot answer. The idea is to make online health conversations more natural and useful,…

rohanpaul_ai's tweet image. Google research publishes on a better way for AI health assistant.

Wayfinding AI shows that asking a few targeted questions first produces more helpful and more tailored health info than a one-shot answer.

The idea is to make online health conversations more natural and useful,…

Diogo Fernandes reposted

François Chollet

@fchollet

Oct 5

You can teach a Transformer to execute a simple algorithm if you provide the exact step by step algorithm during training via CoT tokens. This is interesting, but the point of machine learning should be to *find* the algorithm during training, from input/output pairs only -- not…

Rohan Paul

@rohanpaul_ai

Oct 4

A beautiful paper from MIT+Harvard+ @GoogleDeepMind 👏 Explains why Transformers miss multi digit multiplication and shows a simple bias that fixes it. The researchers trained two small Transformer models on 4-digit-by-4-digit multiplication. One used a special training method…

rohanpaul_ai's tweet image. A beautiful paper from MIT+Harvard+ @GoogleDeepMind 👏

Explains why Transformers miss multi digit multiplication and shows a simple bias that fixes it.

The researchers trained two small Transformer models on 4-digit-by-4-digit multiplication.

One used a special training method…

Diogo Fernandes reposted

Yu Lin

@basicprompts

Oct 5

This is literally every agent you'll every need

Diogo Fernandes reposted

Gaotang Li

@GaotangLi

Oct 2

Negative Log-Likelihood (NLL) has long been the go-to objective for classification and SFT, but is it universally optimal? We explore when alternative objectives outperform NLL and when they don't, based on two key factors: the objective's prior-leaningness and the model's…