Gordon Lee🍀

@redoragd

Shanghai

github.com/doragd

Joined August 2013

917Posts 323Followers 966Following

You might like

@stingning

@jefffhj

@zhangzhuosheng

@shizhediao

@SinclairWang1

@NlpWestlake

@yuz9yuz

@chrome1996

@jinyang34647007

@qx_dong

@syz0x1

@TianbaoX

@ZEYULIU10

@TsingYoga

@MingZhong_

Pinned

Gordon Lee🍀

@redoragd

Jul 7, 2022

CLICK github.com/doragd AND BECOME FRIENDS 🌸🌸

Gordon Lee🍀

@redoragd

Jul 7, 2022

Really Appreciate

Gordon Lee🍀 reposted

Very cool blog by @character_ai diving into how they trained their proprietary model Kaiju (13B, 34B, 110B), before switching to OSS model, and spoiler: it has Noam Shazeer written all over it. Most of the choices for model design (MQA, SWA, KV Cache, Quantization) are not to…

eliebakouch's tweet image. Very cool blog by @character_ai diving into how they trained their proprietary model Kaiju (13B, 34B, 110B), before switching to OSS model, and spoiler: it has Noam Shazeer written all over it.

Most of the choices for model design (MQA, SWA, KV Cache, Quantization) are not to…

Gordon Lee🍀 reposted

Leonie

@helloiamleonie

Nov 11

Multi-vector embeddings (e.g., ColBERT) are powerful but expensive to scale. MUVERA cuts their memory costs by 70%. MUVERA (Multi-Vector Retrieval via Fixed Dimensional Encodings) is an interesting approach by Google Research. It transforms multi-vector representations into…

helloiamleonie's tweet image. Multi-vector embeddings (e.g., ColBERT) are powerful but expensive to scale.

MUVERA cuts their memory costs by 70%.

MUVERA (Multi-Vector Retrieval via Fixed Dimensional Encodings) is an interesting approach by Google Research. It transforms multi-vector representations into…

Gordon Lee🍀 reposted

You Jiacheng

@YouJiacheng

Nov 10

wtf, a 80-layers 321M model???

Alexander Doria

@Dorialexander

Nov 10

Synthetic playgrounds enabled a series of controlled experiments that brought us to favor extreme depth design. We selected a 80-layers architecture for Baguettotron, with improvements across the board on memorization of logical reasoning: huggingface.co/PleIAs/Baguett…

Dorialexander's tweet image. Synthetic playgrounds enabled a series of controlled experiments that brought us to favor extreme depth design. We selected a 80-layers architecture for Baguettotron, with improvements across the board on memorization of logical reasoning: huggingface.co/PleIAs/Baguett…

Gordon Lee🍀 reposted

凡人小北

@frxiaobei

Nov 11

卡内基梅隆大学机器学习系主任 Zico Kolter 要开一门新课《Intro to Modern AI》，课程内容是从零构建一个 PyTorch 聊天机器人。讲的就是我们每天接触的 AI 是怎么跑起来的。我国内就读大学时的整套课程体系就是采购的 CMU 的内容，亲身体验过，CMU 在教学体系这块是真的强。…

Zico Kolter

@zicokolter

Nov 10

I'm teaching a new "Intro to Modern AI" course at CMU this Spring: modernaicourse.org. It's an early-undergrad course on how to build a chatbot from scratch (well, from PyTorch). The course name has bothered some people – "AI" usually means something much broader in academic…

Gordon Lee🍀 reposted

Gorden Sun

@Gorden_Sun

Nov 10

MiniMind：3块钱、2小时训练一个小GPT 从头开始训练一个小型LLM，仅0.02B参数，没有使用第三方封装好的库，全部使用PyTorch重构实现。是LLM的开源复现，也是一个LLM入门实操教程。 Github：github.com/jingyaogong/mi…

Gorden_Sun's tweet card. 🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT！🌏 Train a 26M-parameter GPT from scratch in just 2h! - jingyaogong/minimind

GitHub - jingyaogong/minimind: 🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT！🌏 Train a 26M-parameter GPT from...

Source: github.com

Gordon Lee🍀 reposted

TuringPost

@TheTuringPost

Nov 9

7+ main precision formats used in AI ▪️ FP32 ▪️ FP16 ▪️ BF16 ▪️ FP8 (E4M3 / E5M2) ▪️ FP4 ▪️ INT8/INT4 ▪️ 2-bit (ternary/binary quantization) General trend: higher precision for training, lower precision for inference. Save the list and learn more about these formats here:…

TheTuringPost's tweet image. 7+ main precision formats used in AI

▪️ FP32
▪️ FP16
▪️ BF16
▪️ FP8 (E4M3 / E5M2)
▪️ FP4
▪️ INT8/INT4
▪️ 2-bit (ternary/binary quantization)

General trend: higher precision for training, lower precision for inference.

Save the list and learn more about these formats here:…

Gordon Lee🍀 reposted

Sebastian Raschka

@rasbt

Nov 4

My new field guide to alternatives to standard LLMs: Gated DeltaNet hybrids (Qwen3-Next, Kimi Linear), text diffusion, code world models, and small reasoning transformers. magazine.sebastianraschka.com/p/beyond-stand…

rasbt's tweet image. My new field guide to alternatives to standard LLMs:

Gated DeltaNet hybrids (Qwen3-Next, Kimi Linear), text diffusion, code world models, and small reasoning transformers.

magazine.sebastianraschka.com/p/beyond-stand…

Gordon Lee🍀 reposted

Sebastian Raschka

@rasbt

Nov 1

With the release of the Kimi Linear LLM last week, we can definitely see that efficient, linear attention variants have seen a resurgence in recent months. Here's a brief summary of what happened. First, linear attention variants have been around for a long time, and I remember…

rasbt's tweet image. With the release of the Kimi Linear LLM last week, we can definitely see that efficient, linear attention variants have seen a resurgence in recent months. Here's a brief summary of what happened.

First, linear attention variants have been around for a long time, and I remember…

Gordon Lee🍀 reposted

GitHubDaily

@GitHub_Daily

Oct 31

想深入了解 ChatGPT、Claude 这些 AI 背后的训练机制，尤其是它们背后那套如何通过人类反馈变得越来越智能的原理。可以看下，来自加州大学数学系教授 Ernest K. Ryu 开设的《大语言模型的强化学习》课程，配套 PPT 和视频可以免费学习。课程从深度强化学习基础讲起，逐步深入到 Transformer…

GitHub_Daily's tweet image. 想深入了解 ChatGPT、Claude 这些 AI 背后的训练机制，尤其是它们背后那套如何通过人类反馈变得越来越智能的原理。

可以看下，来自加州大学数学系教授 Ernest K. Ryu 开设的《大语言模型的强化学习》课程，配套 PPT 和视频可以免费学习。

课程从深度强化学习基础讲起，逐步深入到 Transformer…

Gordon Lee🍀 reposted

meng shao

@shao__meng

Oct 31

HuggingFace 发布的超长技术博客（200页，2-4天才能读完），完整记录了团队训练 SmolLM3 的全过程，对于想训练小模型的团队，必看！…

Gordon Lee🍀 reposted

Andrej Karpathy

@karpathy

Oct 26

Beautiful technical debugging detective longread that starts with a suspicious loss curve and ends all the way in the Objective-C++ depths of PyTorch MPS backend of addcmul_ that silently fails on non-contiguous output tensors. I wonder how long before an LLM can do all of this.

Elana Simon

@ElanaPearl

Oct 23

New blog post: The bug that taught me more about PyTorch than years of using it started with a simple training loss plateau... ended up digging through optimizer states, memory layouts, kernel dispatch, and finally understanding how PyTorch works!

ElanaPearl's tweet image. New blog post: The bug that taught me more about PyTorch than years of using it

started with a simple training loss plateau... ended up digging through optimizer states, memory layouts, kernel dispatch, and finally understanding how PyTorch works!

Gordon Lee🍀 reposted

karminski-牙医

@karminski3

Oct 26

如何给 llama.cpp 推理引擎增加新模型架构的教程！来自 pwilkin，没错，就是前几天给 llama.cpp 增加 Qwen3-Next 架构的大佬。教程很不错，我觉得甚至能当 prompt 用，把新架构和这篇教程塞给大模型，直接让大模型开始实现你需要的大模型架构。地址：github.com/ggml-org/llama…

karminski3's tweet image. 如何给 llama.cpp 推理引擎增加新模型架构的教程！

来自 pwilkin，没错，就是前几天给 llama.cpp 增加 Qwen3-Next 架构的大佬。

教程很不错，我觉得甚至能当 prompt 用，把新架构和这篇教程塞给大模型，直接让大模型开始实现你需要的大模型架构。

地址：github.com/ggml-org/llama…

Gordon Lee🍀 reposted

indigo

@indigo11

Oct 26

在花费 60 个小时解构 Elon Musk 的传记后，Founders Podcast 主持人 David Senra 提炼了 100 条 Elon 三十年来的职业生涯法则，强烈推荐本期播客！这里列出了 31 条极其宝贵的公司运作原则： 1. 使命第一； 2. 绝不退缩； 3. 疯狂的紧迫感是我们的行动准则； 4. 产品设计应由工程师驱动； 5.…

David Senra

@FoundersPodcast

Aug 25

New episode: "How Elon Works" This episode covers the insanely valuable company-building principles of Elon Musk A few notes from the episode: 1. The mission comes first. 2. Retreat is not an option. 3. A maniacal sense of urgency is our operating principle. 4. Product…

Gordon Lee🍀 reposted

yihong0618

@yihong0618

Oct 25

代码好工整啊。喜欢了 github.com/MoonshotAI/kim…

yihong0618's tweet card. Kimi CLI is your next CLI agent. Contribute to MoonshotAI/kimi-cli development by creating an account on GitHub.

GitHub - MoonshotAI/kimi-cli: Kimi CLI is your next CLI agent.

Source: github.com

Gordon Lee🍀 reposted

Daniel

@danhergir

Oct 25

oh god

Gordon Lee🍀 reposted

Shekswess

@Shekswess

Oct 24

Tiny Reasoning Language Model (trlm-135) - Technical Blogpost⚡ Three weeks ago, I shared a weekend experiment: trlm-135, a tiny language model taught to think step-by-step. The response was incredible and now, the full technical report is live: shekswess.github.io/tiny-reasoning…

Shekswess's tweet card. Exploring the capabilities of Tiny Language Models to reason and understand complex tasks.

Can Tiny Language Models Reason?

Source: shekswess.github.io

Gordon Lee🍀 reposted

Mr. Rc

@rcx86

Oct 23

I finally understand GPUs thanks to this

Gordon Lee🍀 reposted

GitHubDaily

@GitHub_Daily

Oct 23

想学习高级逆向工程技术，特别恶意软件分析这块，网上资料要么过于零散，要么太理论化缺少实操，不知道该从何学起。恰巧，在 GitHub 上发现了 CS7038-Malware-Analysis 这门完整的大学课程，为我们提供了一条从零到精通的系统学习路径。…

GitHub_Daily's tweet image. 想学习高级逆向工程技术，特别恶意软件分析这块，网上资料要么过于零散，要么太理论化缺少实操，不知道该从何学起。

恰巧，在 GitHub 上发现了 CS7038-Malware-Analysis 这门完整的大学课程，为我们提供了一条从零到精通的系统学习路径。…

Gordon Lee🍀 reposted

pdawg

@prathamgrv

Oct 23

wrote a blog on KV caching. link in comments.

Gordon Lee🍀 reposted

Andrej Karpathy

@karpathy

Nov 7, 2020

How to become expert at thing: 1 iteratively take on concrete projects and accomplish them depth wise, learning “on demand” (ie don’t learn bottom up breadth wise) 2 teach/summarize everything you learn in your own words 3 only compare yourself to younger you, never to others