PlsGoForIt's profile picture. Code

Don't be a fool

Steven

@PlsGoForIt

Code Don't be a fool

Pinned

青山依旧在,几度夕阳红 which means all heros and kings will die, no matter how great achievement they have made. This spirit maybe is influenced by buddism after Tang dynasty. Is this type of disillusionment Chinese culture unique ? Is there other culture with same spirit?

PlsGoForIt's tweet image. 青山依旧在,几度夕阳红 which means all heros and kings will die, no matter how great achievement they have made. This spirit maybe is influenced by buddism after Tang dynasty. Is this type of disillusionment Chinese culture unique ? Is there other culture with same spirit?

Steven reposted

Reinforcement Learning of Large Language Models, Spring 2025(UCLA) Great set of new lectures on reinforcement learning of LLMs. Covers a wide range of topics related to RLxLLMs such as basics/foundations, test-time compute, RLHF, and RL with verifiable rewards(RLVR).

Jeande_d's tweet image. Reinforcement Learning of Large Language Models, Spring 2025(UCLA)

Great set of new lectures on reinforcement learning of LLMs. Covers a wide range of topics related to RLxLLMs such as basics/foundations, test-time compute, RLHF, and RL with verifiable rewards(RLVR).
Jeande_d's tweet image. Reinforcement Learning of Large Language Models, Spring 2025(UCLA)

Great set of new lectures on reinforcement learning of LLMs. Covers a wide range of topics related to RLxLLMs such as basics/foundations, test-time compute, RLHF, and RL with verifiable rewards(RLVR).

Steven reposted

We're sharing the insights and the technical report behind Mem-Agent, our 4B model for persistent memory in LLMs. How we built it, the benchmarks, and why it works:

driaforall's tweet image. We're sharing the insights and the technical report behind Mem-Agent, our 4B model for persistent memory in LLMs.

How we built it, the benchmarks, and why it works:

Steven reposted

《智能体设计模式》中文翻译计划启动 接下来的一周,我将通过 AI 初次翻译 → AI 交叉评审 → 人工精读优化的方式来翻译这本书,所有翻译内容将持续更新到开源项目:github.com/ginobefun/agen… 本书由 Antonio Gulli 撰写、谷歌 Cloud AI 副总裁 Saurabh Tiwary 作序、高盛 CIO Marco Argenti…

hongming731's tweet image. 《智能体设计模式》中文翻译计划启动

接下来的一周,我将通过 AI 初次翻译 → AI 交叉评审 → 人工精读优化的方式来翻译这本书,所有翻译内容将持续更新到开源项目:github.com/ginobefun/agen…

本书由 Antonio Gulli 撰写、谷歌 Cloud AI 副总裁 Saurabh Tiwary 作序、高盛 CIO Marco Argenti…
hongming731's tweet image. 《智能体设计模式》中文翻译计划启动

接下来的一周,我将通过 AI 初次翻译 → AI 交叉评审 → 人工精读优化的方式来翻译这本书,所有翻译内容将持续更新到开源项目:github.com/ginobefun/agen…

本书由 Antonio Gulli 撰写、谷歌 Cloud AI 副总裁 Saurabh Tiwary 作序、高盛 CIO Marco Argenti…
hongming731's tweet image. 《智能体设计模式》中文翻译计划启动

接下来的一周,我将通过 AI 初次翻译 → AI 交叉评审 → 人工精读优化的方式来翻译这本书,所有翻译内容将持续更新到开源项目:github.com/ginobefun/agen…

本书由 Antonio Gulli 撰写、谷歌 Cloud AI 副总裁 Saurabh Tiwary 作序、高盛 CIO Marco Argenti…
hongming731's tweet image. 《智能体设计模式》中文翻译计划启动

接下来的一周,我将通过 AI 初次翻译 → AI 交叉评审 → 人工精读优化的方式来翻译这本书,所有翻译内容将持续更新到开源项目:github.com/ginobefun/agen…

本书由 Antonio Gulli 撰写、谷歌 Cloud AI 副总裁 Saurabh Tiwary 作序、高盛 CIO Marco Argenti…

正如设计模式曾是软件工程的圣经,这本由谷歌资深工程主管免费分享的《智能体设计模式》,正为火热的 AI Agent 领域带来首套系统性的设计原则与最佳实践。🔽

hongming731's tweet image. 正如设计模式曾是软件工程的圣经,这本由谷歌资深工程主管免费分享的《智能体设计模式》,正为火热的 AI Agent 领域带来首套系统性的设计原则与最佳实践。🔽


Steven reposted

BTW, They released a deep dive on FP8 KVCache of main MLA. github.com/deepseek-ai/Fl… so, actually ≈1/5 compared to FP8 dense MLA.

YouJiacheng's tweet image. BTW, They released a deep dive on FP8 KVCache of main MLA.
github.com/deepseek-ai/Fl…
so, actually ≈1/5 compared to FP8 dense MLA.

As expected, NSA is not compatible with MLA, so DeepSeek chose another method: use a smaller (d=128) attention (w/o value) as the indexer. Asymptotic cost ratio = 128/576. In addition, indexer uses FP8 while main MLA uses 16-bit, so = 64/576 = 1/9.

YouJiacheng's tweet image. As expected, NSA is not compatible with MLA, so DeepSeek chose another method: use a smaller (d=128) attention (w/o value) as the indexer.
Asymptotic cost ratio = 128/576.
In addition, indexer uses FP8 while main MLA uses 16-bit, so = 64/576 = 1/9.


Steven reposted

Cool new blog post by Thinking machines: LoRA is all you need for SFT and RL, even for medium-sized post-training runs. Some highlights: Rank 64 or 128 seems to be very close to full FT in performance. Also interesting findings for how to set learning rates: The optimal FullFT…

LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.…

thinkymachines's tweet image. LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.…


Steven reposted

How does @deepseek_ai Sparse Attention (DSA) work? It has 2 components: the Lightning Indexer and Sparse Multi-Latent Attention (MLA). The indexer keeps a small key cache of 128 per token (vs. 512 for MLA). It scores incoming queries. The top-2048 tokens to pass to Sparse MLA.

vllm_project's tweet image. How does @deepseek_ai Sparse Attention (DSA) work? 

It has 2 components: the Lightning Indexer and Sparse Multi-Latent Attention (MLA). The indexer keeps a small key cache of 128 per token (vs. 512 for MLA). It scores incoming queries. The top-2048 tokens to pass to Sparse MLA.

🚀 Introducing DeepSeek-V3.2-Exp — our latest experimental model! ✨ Built on V3.1-Terminus, it debuts DeepSeek Sparse Attention(DSA) for faster, more efficient training & inference on long context. 👉 Now live on App, Web, and API. 💰 API prices cut by 50%+! 1/n



Steven reposted

ReasoningBank: memory for self-evolving LLM agents • Distills strategies from both successes & failures • Enables agents to learn, reuse, and improve over time • Outperforms prior memory methods on web & SWE tasks (+34.2% eff., –16% steps)

arankomatsuzaki's tweet image. ReasoningBank: memory for self-evolving LLM agents

• Distills strategies from both successes & failures
• Enables agents to learn, reuse, and improve over time
• Outperforms prior memory methods on web & SWE tasks (+34.2% eff., –16% steps)

Steven reposted

I quite enjoyed this and it covers a bunch of topics without good introductory resources! 1. A bunch of GPU hardware details in one place (warp schedulers, shared memory, etc.) 2. A breakdown/walkthrough of reading PTX and SASS. 3. Some details/walkthroughs of a number of other…

New in-depth blog post time: "Inside NVIDIA GPUs: Anatomy of high performance matmul kernels". If you want to deeply understand how one writes state of the art matmul kernels in CUDA read along. (Remember matmul is the single most important operation that transformers execute…

gordic_aleksa's tweet image. New in-depth blog post time: "Inside NVIDIA GPUs: Anatomy of high performance matmul kernels". If you want to deeply understand how one writes state of the art matmul kernels in CUDA read along.

(Remember matmul is the single most important operation that transformers execute…


Steven reposted

Introducing LLM.Q: Quantized LLM training in pure CUDA/C++! With LLM.Q, you can train your own LLM on consumer GPUs with natively quantized matmuls, on single workstations. No datacenter required. Inspired by @karpathy's llm.c, but natively quantized.


Steven reposted

Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to…

thinkymachines's tweet image. Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference”

We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to…

Steven reposted

Some papers are timeless and "essential reading"! Understanding Integer Overflow in C/C++ users.cs.utah.edu/~regehr/papers…

vivekgalatage's tweet image. Some papers are timeless and "essential reading"!

Understanding Integer Overflow in C/C++

users.cs.utah.edu/~regehr/papers…

Steven reposted

免费 GPU 或廉价算力推荐,可用于学习10B以内模型微调训练,我会给学员推荐这些环境练习模型训练。 github.com/ninehills/blog…

9hills's tweet image. 免费 GPU 或廉价算力推荐,可用于学习10B以内模型微调训练,我会给学员推荐这些环境练习模型训练。

github.com/ninehills/blog…

Steven reposted

Good morning with a banger resource on vLLM

KMohan2006's tweet image. Good morning with a banger resource on vLLM

Steven reposted

The Ultimate Guide to Fine-Tuning LLMs. Here’s a 7-step roadmap that makes it simple. This technical report takes a deep look at the fine-tuning process for LLMs. Combining both theory and practice. 1. Introduction 2. Seven Stage Fine-Tuning Pipeline ↳ Stage-1: Data…

_rohit_tiwari_'s tweet image. The Ultimate Guide to Fine-Tuning LLMs.

Here’s a 7-step roadmap that makes it simple.

This technical report takes a deep look at the fine-tuning process for LLMs.

Combining both theory and practice.

1. Introduction

2. Seven Stage Fine-Tuning Pipeline 
↳ Stage-1: Data…

Steven reposted

btw it's possible to use mantissa-free weights. The Chief Scientist of NVIDIA had a keynote with content about it. nvidia.com/en-us/on-deman…

YouJiacheng's tweet image. btw it's possible to use mantissa-free weights.
The Chief Scientist of NVIDIA had a keynote with content about it.
nvidia.com/en-us/on-deman…

Not sure if this is the case now but some miss that UE8M0 (Unsigned, Exponent 8, Mantissa 0) used for V3.1 is a microscaling data format. They're NOT using mantissa-free *weights*. It's just for large dynamic range, cheaply applied scale factors.

teortaxesTex's tweet image. Not sure if this is the case now but some miss that UE8M0 (Unsigned, Exponent 8, Mantissa 0) used for V3.1 is a microscaling data format. They're NOT using mantissa-free *weights*. It's just for large dynamic range, cheaply applied scale factors.


Steven reposted

RL Is Neither a Panacea Nor a Mirage: Understanding Supervised vs. Reinforcement Learning Fine-Tuning for LLMs "RL primarily counteracts SFT-induced directional drift rather than finding new solutions. Our spectrum-aware analysis highlights inexpensive recovery knobs low-rank…

iScienceLuvr's tweet image. RL Is Neither a Panacea Nor a Mirage: Understanding Supervised vs. Reinforcement Learning Fine-Tuning for LLMs

"RL primarily counteracts SFT-induced directional drift rather than finding new solutions. Our spectrum-aware analysis highlights  inexpensive recovery knobs low-rank…

Steven reposted

TogetherAI's Chief Scientist @tri_dao announced Flash Attention v4 at HotChips Conference which is up to 22% faster than the attention kernel implementation from NVIDIA's cuDNN library. Tri Dao was able to achieve this 2 key algorithmic changes. Firstly, it uses a new online…

SemiAnalysis_'s tweet image. TogetherAI's Chief Scientist @tri_dao announced Flash Attention v4 at HotChips Conference which is up to 22% faster than the attention kernel implementation from NVIDIA's cuDNN library. Tri Dao was able to achieve this 2 key algorithmic changes. Firstly, it uses a new online…

Steven reposted

Understanding the Linux Virtual Memory Manager

chessMan786's tweet image. Understanding the Linux Virtual Memory Manager

Steven reposted

I've written the full story of Attention Sinks — a technical deep-dive into how the mechanism was developed and how our research ended up being used in OpenAI's new OSS models. For those interested in the details: hanlab.mit.edu/blog/streaming…

Guangxuan_Xiao's tweet image. I've written the full story of Attention Sinks — a technical deep-dive into how the mechanism was developed and how our research ended up being used in OpenAI's new OSS models.

For those interested in the details:
hanlab.mit.edu/blog/streaming…

Steven reposted

just published a curated list of amazing blogs/articles on ai. highly selective. feel free to comment if you find anything interesting, will update it async. (link in replies)

himanshustwts's tweet image. just published a curated list of amazing blogs/articles on ai. highly selective. 

feel free to comment if you find anything interesting, will update it async.

(link in replies)

United States Trends

Loading...

Something went wrong.


Something went wrong.