notes_own's profile picture. The past is closer to the future than now 支那猪滚

Elias

@notes_own

The past is closer to the future than now 支那猪滚

Elias 已轉發

pretty sure pewdiepie watched this.

birdabo's tweet image. pretty sure pewdiepie watched this.

Elias 已轉發

近日,日本參議院議員石平在自己的個人YouTube頻道中,痛罵中國國家主席習近平29分鐘! 包括但不限於「小學生」「傻X」「蠢豬」「不是個東西」等金句! 火力之猛,實屬日本現役參議員中的罕見! 影片連結放到下方第一條評論中 👇👇👇


Elias 已轉發

笑出猪叫 近日,日本参议院议员石平在自己的个人YouTube頻道中,对习包子全程开大,多达29分鐘! 包括但不限于「小学生」「傻X」「蠢猪」等金句! 骂得好 发出来给小粉们破防下🤪


Elias 已轉發

Samsung's Tiny Recursive Model (TRM) masters complex reasoning With just 7M parameters, TRM outperforms large LLMs on hard puzzles like Sudoku & ARC-AGI. This "Less is More" approach redefines efficiency in AI, using less than 0.01% of competitors' parameters!

HuggingPapers's tweet image. Samsung's Tiny Recursive Model (TRM) masters complex reasoning

With just 7M parameters, TRM outperforms large LLMs on hard puzzles like Sudoku & ARC-AGI. This "Less is More" approach redefines efficiency in AI, using less than 0.01% of competitors' parameters!

Elias 已轉發

Transfusion combines autoregressive with diffusion to train a single transformer, but what if we combine Flow with Flow? 🤔 🌊OneFlow🌊 the first non-autoregressive model to generate text and images concurrently using a single transformer—unifying Edit Flow (text) with Flow…


Elias 已轉發

Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for…


Elias 已轉發

🤔Can we train RL on LLMs with extremely stale data? 🚀Our latest study says YES! Stale data can be as informative as on-policy data, unlocking more scalable, efficient asynchronous RL for LLMs. We introduce M2PO, an off-policy RL algorithm that keeps training stable and…

InfiniAILab's tweet image. 🤔Can we train RL on LLMs with extremely stale data?

🚀Our latest study says YES!

Stale data can be as informative as on-policy data, unlocking more scalable, efficient asynchronous RL for LLMs.

We introduce M2PO, an off-policy RL algorithm that keeps training stable and…

Elias 已轉發

Introducing Pretraining with Hierarchical Memories: Separating Knowledge & Reasoning for On-Device LLM Deployment 💡We propose dividing LLM parameters into 1) anchor (always used, capturing commonsense) and 2) memory bank (selected per query, capturing world knowledge). [1/X]🧵


Elias 已轉發

My brain broke when I read this paper. A tiny 7 Million parameter model just beat DeepSeek-R1, Gemini 2.5 pro, and o3-mini at reasoning on both ARG-AGI 1 and ARC-AGI 2. It's called Tiny Recursive Model (TRM) from Samsung. How can a model 10,000x smaller be smarter? Here's how…

JacksonAtkinsX's tweet image. My brain broke when I read this paper.

A tiny 7 Million parameter model just beat DeepSeek-R1, Gemini 2.5 pro, and o3-mini at reasoning on both ARG-AGI 1 and ARC-AGI 2.

It's called Tiny Recursive Model (TRM) from Samsung.

How can a model 10,000x smaller be smarter?

Here's how…

Elias 已轉發

Absolutely classic @GoogleResearch paper on In-Context-Learning by LLMs. Shows the mechanisms of how LLMs learn in context from examples in the prompt, can pick up new patterns while answering, yet their stored weights never change. 💡The mechanism they reveal for…

rohanpaul_ai's tweet image. Absolutely classic  @GoogleResearch  paper on In-Context-Learning by LLMs.

Shows the mechanisms of how LLMs learn in context from examples in the prompt,  can pick up new patterns while answering, yet their stored weights never change.

💡The mechanism they reveal for…

Elias 已轉發

No Prompt Left Behind: A New Era for LLM Reinforcement Learning This paper introduces RL-ZVP, a novel algorithm that unlocks learning signals from previously ignored "zero-variance prompts" in LLM training. It achieves significant accuracy improvements on math reasoning…

HuggingPapers's tweet image. No Prompt Left Behind: A New Era for LLM Reinforcement Learning

This paper introduces RL-ZVP, a novel algorithm that unlocks learning signals from previously ignored "zero-variance prompts" in LLM training. It achieves significant accuracy improvements on math reasoning…

Elias 已轉發

behold... the CUDA grid

elliotarledge's tweet image. behold... the CUDA grid

Elias 已轉發

This is a super cool topic! I did a class project with a friend for a course on Kernels during senior year in college 🔗: github.com/rish-16/ma4270… Lots of fun connections between kernels and self-attention, especially when learning periodic functions The attention patterns…

rishabh16_'s tweet image. This is a super cool topic! I did a class project with a friend for a course on Kernels during senior year in college

🔗: github.com/rish-16/ma4270…

Lots of fun connections between kernels and self-attention, especially when learning periodic functions

The attention patterns…
rishabh16_'s tweet image. This is a super cool topic! I did a class project with a friend for a course on Kernels during senior year in college

🔗: github.com/rish-16/ma4270…

Lots of fun connections between kernels and self-attention, especially when learning periodic functions

The attention patterns…
rishabh16_'s tweet image. This is a super cool topic! I did a class project with a friend for a course on Kernels during senior year in college

🔗: github.com/rish-16/ma4270…

Lots of fun connections between kernels and self-attention, especially when learning periodic functions

The attention patterns…

How Kernel Regression is related to Attention Mechanism - a summary in 10 slides. 0/1

docmilanfar's tweet image. How Kernel Regression is related to Attention Mechanism - a summary in 10 slides.

0/1


Elias 已轉發

The paper shows that Group Relative Policy Optimization (GRPO) behaves like Direct Preference Optimization (DPO), so training on simple answer pairs works. Turns a complex GRPO setup into a simple pairwise recipe without losing quality. This cuts tokens, compute, and wall time,…

rohanpaul_ai's tweet image. The paper shows that Group Relative Policy Optimization (GRPO) behaves like Direct Preference Optimization (DPO), so training on simple answer pairs works.

Turns a complex GRPO setup into a simple pairwise recipe without losing quality.

This cuts tokens, compute, and wall time,…

Elias 已轉發

🚀 Want high-quality, realistic, and truly challenging post-training data for the agentic era? Introducing Toucan-1.5M (huggingface.co/papers/2510.01…) — the largest open tool-agentic dataset yet: ✨ 1.53M real agentic trajectories synthesized by 3 models ✨ Diverse, challenging tasks…

zhangchen_xu's tweet image. 🚀 Want high-quality, realistic, and truly challenging post-training data for the agentic era?

Introducing Toucan-1.5M (huggingface.co/papers/2510.01…) — the largest open tool-agentic dataset yet:
✨ 1.53M real agentic trajectories synthesized by 3 models
✨ Diverse, challenging tasks…

Elias 已轉發

This is a solid blog breaking down how mixture-of-experts (MoE) language models can actually be served cheaply if their design is matched to hardware limits. MoE models only activate a few experts per token, which saves compute but causes heavy memory and communication use.…

rohanpaul_ai's tweet image. This is a solid blog breaking down how mixture-of-experts (MoE) language models can actually be served cheaply if their design is matched to hardware limits.

MoE models only activate a few experts per token, which saves compute but causes heavy memory and communication use.…
rohanpaul_ai's tweet image. This is a solid blog breaking down how mixture-of-experts (MoE) language models can actually be served cheaply if their design is matched to hardware limits.

MoE models only activate a few experts per token, which saves compute but causes heavy memory and communication use.…
rohanpaul_ai's tweet image. This is a solid blog breaking down how mixture-of-experts (MoE) language models can actually be served cheaply if their design is matched to hardware limits.

MoE models only activate a few experts per token, which saves compute but causes heavy memory and communication use.…
rohanpaul_ai's tweet image. This is a solid blog breaking down how mixture-of-experts (MoE) language models can actually be served cheaply if their design is matched to hardware limits.

MoE models only activate a few experts per token, which saves compute but causes heavy memory and communication use.…

Elias 已轉發

Analysis of KL estimator k1, k2, k3 and their use as a reward or as a loss. Partially this is a continuation of previous discussions on it x.com/QuanquanGu/sta…. By the way RLHF is a bit misleading; they did RLVR.

rosinality's tweet image. Analysis of KL estimator k1, k2, k3 and their use as a reward or as a loss. Partially this is a continuation of previous discussions on it x.com/QuanquanGu/sta….

By the way RLHF is a bit misleading; they did RLVR.

The original GRPO is an off-policy RL algorithm, but its KL regularization isn't done right. Specifically, the k3 estimator for the unnormalized reverse KL is missing the importance weight. The correct formulation should be:

QuanquanGu's tweet image. The original GRPO is an off-policy RL algorithm, but its KL regularization isn't done right. Specifically, the k3 estimator for the unnormalized reverse KL is missing the importance weight. The correct formulation should be:


Elias 已轉發

Meta just ran one of the largest synthetic-data studies (over 1000 LLMs, more than 100k GPU hours). Result: mixing synthetic and natural data only helps once you cross the right scale and ratio (~30%). Small models learn nothing; larger ones suddenly gain a sharp threshold…

gm8xx8's tweet image. Meta just ran one of the largest synthetic-data studies (over 1000 LLMs, more than 100k GPU hours).
Result: mixing synthetic and natural data only helps once you cross the right scale and ratio (~30%).
Small models learn nothing; larger ones suddenly gain a sharp threshold…

Data Mixing Can Induce Phase Transitions in Knowledge Acquisition 𝑨 𝑪𝑳𝑬𝑨𝑵, 𝑭𝑶𝑹𝑴𝑨𝑳 𝑩𝑹𝑬𝑨𝑲𝑫𝑶𝑾𝑵 𝑶𝑭 𝑾𝑯𝒀 𝒀𝑶𝑼𝑹 𝟕𝑩 𝑳𝑳𝑴 𝑳𝑬𝑨𝑹𝑵𝑺 𝑵𝑶𝑻𝑯𝑰𝑵𝑮 𝑭𝑹𝑶𝑴 𝑯𝑰𝑮𝑯-𝑸𝑼𝑨𝑳𝑰𝑻𝒀 𝑫𝑨𝑻𝑨 This paper reveals phase transitions in factual memorization…

gm8xx8's tweet image. Data Mixing Can Induce Phase Transitions in Knowledge Acquisition

𝑨 𝑪𝑳𝑬𝑨𝑵, 𝑭𝑶𝑹𝑴𝑨𝑳 𝑩𝑹𝑬𝑨𝑲𝑫𝑶𝑾𝑵 𝑶𝑭 𝑾𝑯𝒀 𝒀𝑶𝑼𝑹 𝟕𝑩 𝑳𝑳𝑴 𝑳𝑬𝑨𝑹𝑵𝑺 𝑵𝑶𝑻𝑯𝑰𝑵𝑮 𝑭𝑹𝑶𝑴 𝑯𝑰𝑮𝑯-𝑸𝑼𝑨𝑳𝑰𝑻𝒀 𝑫𝑨𝑻𝑨

This paper reveals phase transitions in factual memorization…


Elias 已轉發

RL fine-tuning often prematurely collapses policy entropy. We consider a general framework, called set RL, i.e. RL over a set of trajectories from a policy. We use it to incentivize diverse solutions & optimize for inference time performance. Paper: arxiv.org/abs/2509.25424

Exploration is fundamental to RL. Yet policy gradient methods often collapse: during training they fail to explore broadly, and converge into narrow, easily exploitable behaviors. The result is poor generalization, limited gains from test-time scaling, and brittleness on tasks…

jubayer_hamid's tweet image. Exploration is fundamental to RL. Yet policy gradient methods often collapse: during training they fail to explore broadly, and converge into narrow, easily exploitable behaviors. The result is poor generalization, limited gains from test-time scaling, and brittleness on tasks…


Elias 已轉發

🚨🚨New Paper: Training LLMs to Discover Abstractions for Solving Reasoning Problems Introducing RLAD, a two-player RL framework for LLMs to discover 'reasoning abstractions'—natural language hints that encode procedural knowledge for structured exploration in reasoning.🧵⬇️

Anikait_Singh_'s tweet image. 🚨🚨New Paper: Training LLMs to Discover Abstractions for Solving Reasoning Problems

Introducing RLAD, a two-player RL framework for LLMs to discover 'reasoning abstractions'—natural language hints that encode procedural knowledge for structured exploration in reasoning.🧵⬇️

Loading...

Something went wrong.


Something went wrong.