notes_own's profile picture. The past is closer to the future than now 支那猪滚

Elias

@notes_own

The past is closer to the future than now 支那猪滚

Elias reposted

Samsung's Tiny Recursive Model (TRM) masters complex reasoning With just 7M parameters, TRM outperforms large LLMs on hard puzzles like Sudoku & ARC-AGI. This "Less is More" approach redefines efficiency in AI, using less than 0.01% of competitors' parameters!

HuggingPapers's tweet image. Samsung's Tiny Recursive Model (TRM) masters complex reasoning

With just 7M parameters, TRM outperforms large LLMs on hard puzzles like Sudoku & ARC-AGI. This "Less is More" approach redefines efficiency in AI, using less than 0.01% of competitors' parameters!

Elias reposted

Transfusion combines autoregressive with diffusion to train a single transformer, but what if we combine Flow with Flow? 🤔 🌊OneFlow🌊 the first non-autoregressive model to generate text and images concurrently using a single transformer—unifying Edit Flow (text) with Flow…


Elias reposted

Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for…


Elias reposted

🤔Can we train RL on LLMs with extremely stale data? 🚀Our latest study says YES! Stale data can be as informative as on-policy data, unlocking more scalable, efficient asynchronous RL for LLMs. We introduce M2PO, an off-policy RL algorithm that keeps training stable and…

InfiniAILab's tweet image. 🤔Can we train RL on LLMs with extremely stale data?

🚀Our latest study says YES!

Stale data can be as informative as on-policy data, unlocking more scalable, efficient asynchronous RL for LLMs.

We introduce M2PO, an off-policy RL algorithm that keeps training stable and…

Elias reposted

1951年, 中国国防部为整军整党而刊印的 «中国国民党党员守则浅释»: 苏俄帝国主义利用了 共匪朱毛汉奸集团, 要来消灭我们中华民国, 奴役我们中华民族……

XinHaoNing's tweet image. 1951年,
中国国防部为整军整党而刊印的
«中国国民党党员守则浅释»:

苏俄帝国主义利用了
共匪朱毛汉奸集团,
要来消灭我们中华民国,
奴役我们中华民族……

Elias reposted

Introducing Pretraining with Hierarchical Memories: Separating Knowledge & Reasoning for On-Device LLM Deployment 💡We propose dividing LLM parameters into 1) anchor (always used, capturing commonsense) and 2) memory bank (selected per query, capturing world knowledge). [1/X]🧵


Elias reposted

My brain broke when I read this paper. A tiny 7 Million parameter model just beat DeepSeek-R1, Gemini 2.5 pro, and o3-mini at reasoning on both ARG-AGI 1 and ARC-AGI 2. It's called Tiny Recursive Model (TRM) from Samsung. How can a model 10,000x smaller be smarter? Here's how…

JacksonAtkinsX's tweet image. My brain broke when I read this paper.

A tiny 7 Million parameter model just beat DeepSeek-R1, Gemini 2.5 pro, and o3-mini at reasoning on both ARG-AGI 1 and ARC-AGI 2.

It's called Tiny Recursive Model (TRM) from Samsung.

How can a model 10,000x smaller be smarter?

Here's how…

Elias reposted

Absolutely classic @GoogleResearch paper on In-Context-Learning by LLMs. Shows the mechanisms of how LLMs learn in context from examples in the prompt, can pick up new patterns while answering, yet their stored weights never change. 💡The mechanism they reveal for…

rohanpaul_ai's tweet image. Absolutely classic  @GoogleResearch  paper on In-Context-Learning by LLMs.

Shows the mechanisms of how LLMs learn in context from examples in the prompt,  can pick up new patterns while answering, yet their stored weights never change.

💡The mechanism they reveal for…

Elias reposted

No Prompt Left Behind: A New Era for LLM Reinforcement Learning This paper introduces RL-ZVP, a novel algorithm that unlocks learning signals from previously ignored "zero-variance prompts" in LLM training. It achieves significant accuracy improvements on math reasoning…

HuggingPapers's tweet image. No Prompt Left Behind: A New Era for LLM Reinforcement Learning

This paper introduces RL-ZVP, a novel algorithm that unlocks learning signals from previously ignored "zero-variance prompts" in LLM training. It achieves significant accuracy improvements on math reasoning…

Elias reposted

behold... the CUDA grid

elliotarledge's tweet image. behold... the CUDA grid

Elias reposted

This is a super cool topic! I did a class project with a friend for a course on Kernels during senior year in college 🔗: github.com/rish-16/ma4270… Lots of fun connections between kernels and self-attention, especially when learning periodic functions The attention patterns…

rishabh16_'s tweet image. This is a super cool topic! I did a class project with a friend for a course on Kernels during senior year in college

🔗: github.com/rish-16/ma4270…

Lots of fun connections between kernels and self-attention, especially when learning periodic functions

The attention patterns…
rishabh16_'s tweet image. This is a super cool topic! I did a class project with a friend for a course on Kernels during senior year in college

🔗: github.com/rish-16/ma4270…

Lots of fun connections between kernels and self-attention, especially when learning periodic functions

The attention patterns…
rishabh16_'s tweet image. This is a super cool topic! I did a class project with a friend for a course on Kernels during senior year in college

🔗: github.com/rish-16/ma4270…

Lots of fun connections between kernels and self-attention, especially when learning periodic functions

The attention patterns…

How Kernel Regression is related to Attention Mechanism - a summary in 10 slides. 0/1

docmilanfar's tweet image. How Kernel Regression is related to Attention Mechanism - a summary in 10 slides.

0/1


Elias reposted

The paper shows that Group Relative Policy Optimization (GRPO) behaves like Direct Preference Optimization (DPO), so training on simple answer pairs works. Turns a complex GRPO setup into a simple pairwise recipe without losing quality. This cuts tokens, compute, and wall time,…

rohanpaul_ai's tweet image. The paper shows that Group Relative Policy Optimization (GRPO) behaves like Direct Preference Optimization (DPO), so training on simple answer pairs works.

Turns a complex GRPO setup into a simple pairwise recipe without losing quality.

This cuts tokens, compute, and wall time,…

Elias reposted

🚀 Want high-quality, realistic, and truly challenging post-training data for the agentic era? Introducing Toucan-1.5M (huggingface.co/papers/2510.01…) — the largest open tool-agentic dataset yet: ✨ 1.53M real agentic trajectories synthesized by 3 models ✨ Diverse, challenging tasks…

zhangchen_xu's tweet image. 🚀 Want high-quality, realistic, and truly challenging post-training data for the agentic era?

Introducing Toucan-1.5M (huggingface.co/papers/2510.01…) — the largest open tool-agentic dataset yet:
✨ 1.53M real agentic trajectories synthesized by 3 models
✨ Diverse, challenging tasks…

Elias reposted

This is a solid blog breaking down how mixture-of-experts (MoE) language models can actually be served cheaply if their design is matched to hardware limits. MoE models only activate a few experts per token, which saves compute but causes heavy memory and communication use.…

rohanpaul_ai's tweet image. This is a solid blog breaking down how mixture-of-experts (MoE) language models can actually be served cheaply if their design is matched to hardware limits.

MoE models only activate a few experts per token, which saves compute but causes heavy memory and communication use.…
rohanpaul_ai's tweet image. This is a solid blog breaking down how mixture-of-experts (MoE) language models can actually be served cheaply if their design is matched to hardware limits.

MoE models only activate a few experts per token, which saves compute but causes heavy memory and communication use.…
rohanpaul_ai's tweet image. This is a solid blog breaking down how mixture-of-experts (MoE) language models can actually be served cheaply if their design is matched to hardware limits.

MoE models only activate a few experts per token, which saves compute but causes heavy memory and communication use.…
rohanpaul_ai's tweet image. This is a solid blog breaking down how mixture-of-experts (MoE) language models can actually be served cheaply if their design is matched to hardware limits.

MoE models only activate a few experts per token, which saves compute but causes heavy memory and communication use.…

Elias reposted

Analysis of KL estimator k1, k2, k3 and their use as a reward or as a loss. Partially this is a continuation of previous discussions on it x.com/QuanquanGu/sta…. By the way RLHF is a bit misleading; they did RLVR.

rosinality's tweet image. Analysis of KL estimator k1, k2, k3 and their use as a reward or as a loss. Partially this is a continuation of previous discussions on it x.com/QuanquanGu/sta….

By the way RLHF is a bit misleading; they did RLVR.

The original GRPO is an off-policy RL algorithm, but its KL regularization isn't done right. Specifically, the k3 estimator for the unnormalized reverse KL is missing the importance weight. The correct formulation should be:

QuanquanGu's tweet image. The original GRPO is an off-policy RL algorithm, but its KL regularization isn't done right. Specifically, the k3 estimator for the unnormalized reverse KL is missing the importance weight. The correct formulation should be:


Elias reposted

Meta just ran one of the largest synthetic-data studies (over 1000 LLMs, more than 100k GPU hours). Result: mixing synthetic and natural data only helps once you cross the right scale and ratio (~30%). Small models learn nothing; larger ones suddenly gain a sharp threshold…

gm8xx8's tweet image. Meta just ran one of the largest synthetic-data studies (over 1000 LLMs, more than 100k GPU hours).
Result: mixing synthetic and natural data only helps once you cross the right scale and ratio (~30%).
Small models learn nothing; larger ones suddenly gain a sharp threshold…

Data Mixing Can Induce Phase Transitions in Knowledge Acquisition 𝑨 𝑪𝑳𝑬𝑨𝑵, 𝑭𝑶𝑹𝑴𝑨𝑳 𝑩𝑹𝑬𝑨𝑲𝑫𝑶𝑾𝑵 𝑶𝑭 𝑾𝑯𝒀 𝒀𝑶𝑼𝑹 𝟕𝑩 𝑳𝑳𝑴 𝑳𝑬𝑨𝑹𝑵𝑺 𝑵𝑶𝑻𝑯𝑰𝑵𝑮 𝑭𝑹𝑶𝑴 𝑯𝑰𝑮𝑯-𝑸𝑼𝑨𝑳𝑰𝑻𝒀 𝑫𝑨𝑻𝑨 This paper reveals phase transitions in factual memorization…

gm8xx8's tweet image. Data Mixing Can Induce Phase Transitions in Knowledge Acquisition

𝑨 𝑪𝑳𝑬𝑨𝑵, 𝑭𝑶𝑹𝑴𝑨𝑳 𝑩𝑹𝑬𝑨𝑲𝑫𝑶𝑾𝑵 𝑶𝑭 𝑾𝑯𝒀 𝒀𝑶𝑼𝑹 𝟕𝑩 𝑳𝑳𝑴 𝑳𝑬𝑨𝑹𝑵𝑺 𝑵𝑶𝑻𝑯𝑰𝑵𝑮 𝑭𝑹𝑶𝑴 𝑯𝑰𝑮𝑯-𝑸𝑼𝑨𝑳𝑰𝑻𝒀 𝑫𝑨𝑻𝑨

This paper reveals phase transitions in factual memorization…


Elias reposted

RL fine-tuning often prematurely collapses policy entropy. We consider a general framework, called set RL, i.e. RL over a set of trajectories from a policy. We use it to incentivize diverse solutions & optimize for inference time performance. Paper: arxiv.org/abs/2509.25424

Exploration is fundamental to RL. Yet policy gradient methods often collapse: during training they fail to explore broadly, and converge into narrow, easily exploitable behaviors. The result is poor generalization, limited gains from test-time scaling, and brittleness on tasks…

jubayer_hamid's tweet image. Exploration is fundamental to RL. Yet policy gradient methods often collapse: during training they fail to explore broadly, and converge into narrow, easily exploitable behaviors. The result is poor generalization, limited gains from test-time scaling, and brittleness on tasks…


Elias reposted

🚨🚨New Paper: Training LLMs to Discover Abstractions for Solving Reasoning Problems Introducing RLAD, a two-player RL framework for LLMs to discover 'reasoning abstractions'—natural language hints that encode procedural knowledge for structured exploration in reasoning.🧵⬇️

Anikait_Singh_'s tweet image. 🚨🚨New Paper: Training LLMs to Discover Abstractions for Solving Reasoning Problems

Introducing RLAD, a two-player RL framework for LLMs to discover 'reasoning abstractions'—natural language hints that encode procedural knowledge for structured exploration in reasoning.🧵⬇️

Elias reposted

A look at the history of Reinforcement Learning What is Temporal-Difference (TD) learning? @RichardSSutton introduced TD learning in 1988, and today most widely used RL algorithms, like deep actor-critic, rely on TD error as the learning signal. So, TD learning: ▪️ Allows…

TheTuringPost's tweet image. A look at the history of Reinforcement Learning

What is Temporal-Difference (TD) learning?

@RichardSSutton introduced TD learning in 1988, and today most widely used RL algorithms, like deep actor-critic, rely on TD error as the learning signal.

So, TD learning:

▪️ Allows…

Elias reposted

Shirokuro, a Japanese restaurant in NYC's East Village, where the interior mimics a hand-drawn black-and-white sketchbook,

From xan

Loading...

Something went wrong.


Something went wrong.