dioogfernands's profile picture. machine learning enthusiast

Diogo Fernandes

@dioogfernands

machine learning enthusiast

Diogo Fernandes reposted

Our paper "MaskControl: Spatio-Temporal Control for Masked Motion Synthesis" has been selected as an 🏆 Award Candidate at ICCV 2025! 🌺✨ It’s a huge honor to see our work recognized among the top papers this year.


Diogo Fernandes reposted

framework for domain-specific knowledge extraction and reasoning with LLMs

tom_doerr's tweet image. framework for domain-specific knowledge extraction and reasoning with LLMs

Diogo Fernandes reposted

New blog post:

omouamoua's tweet image. New blog post:

Diogo Fernandes reposted

New APPLE paper says a small base model plus fetched memories can act like a bigger one. With about 10% extra fetched parameters, a 160M model matches models over 2x its size. Packing all facts into fixed weights wastes memory and compute because each query needs very little.…

rohanpaul_ai's tweet image. New APPLE paper says a small base model plus fetched memories can act like a bigger one. 

With about 10% extra fetched parameters, a 160M model matches models over 2x its size.

Packing all facts into fixed weights wastes memory and compute because each query needs very little.…

Diogo Fernandes reposted

Import 3D models, auto-rig and export animations in your browser

tom_doerr's tweet image. Import 3D models, auto-rig and export animations in your browser

Diogo Fernandes reposted

The paper links Kolmogorov complexity to Transformers and proposes loss functions that become provably best as model resources grow. It treats learning as compression, minimize bits to describe the model plus bits to describe the labels. Provides a single training target that…

rohanpaul_ai's tweet image. The paper links Kolmogorov complexity to Transformers and proposes loss functions that become provably best as model resources grow.

It treats learning as compression, minimize bits to describe the model plus bits to describe the labels.

Provides a single training target that…

Diogo Fernandes reposted

Segment Anything 3 just silently dropped on ICLR 🤯 The first SAM let you click on an object to segment it. SAM 2 added video and memory. Now SAM 3 says: just describe what you want — “yellow school bus”, “striped cat”, “red apple” — and it will find and segment every instance…

JundeMorsenWu's tweet image. Segment Anything 3 just silently dropped on ICLR 🤯

The first SAM let you click on an object to segment it.
SAM 2 added video and memory.
Now SAM 3 says: just describe what you want — “yellow school bus”, “striped cat”, “red apple” — and it will find and segment every instance…
JundeMorsenWu's tweet image. Segment Anything 3 just silently dropped on ICLR 🤯

The first SAM let you click on an object to segment it.
SAM 2 added video and memory.
Now SAM 3 says: just describe what you want — “yellow school bus”, “striped cat”, “red apple” — and it will find and segment every instance…
JundeMorsenWu's tweet image. Segment Anything 3 just silently dropped on ICLR 🤯

The first SAM let you click on an object to segment it.
SAM 2 added video and memory.
Now SAM 3 says: just describe what you want — “yellow school bus”, “striped cat”, “red apple” — and it will find and segment every instance…
JundeMorsenWu's tweet image. Segment Anything 3 just silently dropped on ICLR 🤯

The first SAM let you click on an object to segment it.
SAM 2 added video and memory.
Now SAM 3 says: just describe what you want — “yellow school bus”, “striped cat”, “red apple” — and it will find and segment every instance…

Diogo Fernandes reposted

Just a bit of weekend coding fun: A memory estimator to calculate the savings when using grouped-query attention vs multi-head attention (+ code implementations of course). 🔗 github.com/rasbt/LLMs-fro… Will add this for multi-head latent, sliding, and sparse attention as well.

rasbt's tweet image. Just a bit of weekend coding fun: A memory estimator to calculate the savings when using grouped-query attention vs multi-head attention (+ code implementations of course).

🔗 github.com/rasbt/LLMs-fro…

Will add this for multi-head latent, sliding, and sparse attention as well.

Diogo Fernandes reposted

🎬 Introducing: Character Mixing for Video Generation Imagine Mr. Bean stepping into Tom & Jerry's world 🐭✨ Now it's possible! ✨ Our framework first enables natural cross-character interactions in text-to-video generation while preserving identity and style fidelity.


Diogo Fernandes reposted

Samsung's Tiny Recursive Model (TRM) masters complex reasoning With just 7M parameters, TRM outperforms large LLMs on hard puzzles like Sudoku & ARC-AGI. This "Less is More" approach redefines efficiency in AI, using less than 0.01% of competitors' parameters!

HuggingPapers's tweet image. Samsung's Tiny Recursive Model (TRM) masters complex reasoning

With just 7M parameters, TRM outperforms large LLMs on hard puzzles like Sudoku & ARC-AGI. This "Less is More" approach redefines efficiency in AI, using less than 0.01% of competitors' parameters!

Diogo Fernandes reposted

Ever wondered what CAN'T be transformed by Transformers? 🪨 I wrote a fun blog post on finding "fixed points" of your LLMs. If you prompt it with a fixed point token, the LLM is gonna decode it repeatedly forever, guaranteed. There's some connection with LLMs' repetition issue.

liujc1998's tweet image. Ever wondered what CAN'T be transformed by Transformers? 🪨

I wrote a fun blog post on finding "fixed points" of your LLMs. If you prompt it with a fixed point token, the LLM is gonna decode it repeatedly forever, guaranteed.

There's some connection with LLMs' repetition issue.

Diogo Fernandes reposted

From the Hierarchical Reasoning Model (HRM) to a new Tiny Recursive Model (TRM). A few months ago, the HRM made big waves in the AI research community as it showed really good performance on the ARC challenge despite its small 27M size. (That's about 22x smaller than the…

rasbt's tweet image. From the Hierarchical Reasoning Model (HRM) to a new Tiny Recursive Model (TRM).

A few months ago, the HRM made big waves in the AI research community as it showed really good performance on the ARC challenge despite its small 27M size. (That's about 22x smaller than the…

Diogo Fernandes reposted

Excited to share Equilibrium Matching (EqM)! EqM simplifies and outperforms flow matching, enabling strong generative performance of FID 1.96 on ImageNet 256x256. EqM learns a single static EBM landscape for generation, enabling a simple gradient-based generation procedure.


Diogo Fernandes reposted

Flow Autoencoders Are Effective Protein Tokenizers 1. The article introduces Kanzi, a novel flow-based tokenizer for protein structures that simplifies the training process and achieves state-of-the-art performance in protein structure tokenization and generation. 2. Kanzi uses…

BiologyAIDaily's tweet image. Flow Autoencoders Are Effective Protein Tokenizers

1. The article introduces Kanzi, a novel flow-based tokenizer for protein structures that simplifies the training process and achieves state-of-the-art performance in protein structure tokenization and generation.

2. Kanzi uses…

Diogo Fernandes reposted

An intriguing paper from Apple. MoEs Are Stronger than You Think: Hyper-Parallel Inference Scaling with RoE Paper: arxiv.org/abs/2509.17238

jiqizhixin's tweet image. An intriguing paper from Apple.

MoEs Are Stronger than You Think: Hyper-Parallel Inference Scaling with RoE

Paper: arxiv.org/abs/2509.17238

Diogo Fernandes reposted

⏱️ New Time series research from @GoogleResearch Shows a new approach to time-series forecasting that uses continued pre-training to teach a time-series foundation model to adapt to in-context examples at inference time. The big deal is that time series forecasting finally…

rohanpaul_ai's tweet image. ⏱️ New Time series research from @GoogleResearch 

Shows a new approach  to time-series forecasting that uses continued pre-training to teach a time-series foundation model to adapt to in-context examples at inference time.

The big deal is that time series forecasting finally…

Diogo Fernandes reposted

Google research publishes on a better way for AI health assistant. Wayfinding AI shows that asking a few targeted questions first produces more helpful and more tailored health info than a one-shot answer. The idea is to make online health conversations more natural and useful,…

rohanpaul_ai's tweet image. Google research publishes on a better way for AI health assistant.

Wayfinding AI shows that asking a few targeted questions first produces more helpful and more tailored health info than a one-shot answer.

The idea is to make online health conversations more natural and useful,…

Diogo Fernandes reposted

You can teach a Transformer to execute a simple algorithm if you provide the exact step by step algorithm during training via CoT tokens. This is interesting, but the point of machine learning should be to *find* the algorithm during training, from input/output pairs only -- not…

A beautiful paper from MIT+Harvard+ @GoogleDeepMind 👏 Explains why Transformers miss multi digit multiplication and shows a simple bias that fixes it. The researchers trained two small Transformer models on 4-digit-by-4-digit multiplication. One used a special training method…

rohanpaul_ai's tweet image. A beautiful paper from MIT+Harvard+ @GoogleDeepMind 👏

Explains why Transformers miss multi digit multiplication and shows a simple bias that fixes it.

The researchers trained two small Transformer models on 4-digit-by-4-digit multiplication.

One used a special training method…


Diogo Fernandes reposted

This is literally every agent you'll every need

basicprompts's tweet image. This is literally every agent you'll every need

Diogo Fernandes reposted

Negative Log-Likelihood (NLL) has long been the go-to objective for classification and SFT, but is it universally optimal? We explore when alternative objectives outperform NLL and when they don't, based on two key factors: the objective's prior-leaningness and the model's…

GaotangLi's tweet image. Negative Log-Likelihood (NLL) has long been the go-to objective for classification and SFT, but is it universally optimal? 

We explore when alternative objectives outperform NLL and when they don't, based on two key factors: the objective's prior-leaningness and the model's…
GaotangLi's tweet image. Negative Log-Likelihood (NLL) has long been the go-to objective for classification and SFT, but is it universally optimal? 

We explore when alternative objectives outperform NLL and when they don't, based on two key factors: the objective's prior-leaningness and the model's…

Loading...

Something went wrong.


Something went wrong.