Felix Nguyen

@felixcantcode

generating values for stakeholders, internet plumber, http://mu2mi.com, blog: https://felixng.me opinions are of my own

1월 2024에 가입

479게시물 32팔로워 694팔로우 중

Felix Nguyen 님이 재게시함

elie

@eliebakouch

15 시간

Training LLMs end to end is hard. Very excited to share our new blog (book?) that cover the full pipeline: pre-training, post-training and infra. 200+ pages of what worked, what didn’t, and how to make it run reliably huggingface.co/spaces/Hugging…

eliebakouch's tweet image. Training LLMs end to end is hard. Very excited to share our new blog (book?) that cover the full pipeline: pre-training, post-training and infra. 200+ pages of what worked, what didn’t, and how to make it run reliably

huggingface.co/spaces/Hugging…

Felix Nguyen 님이 재게시함

Tanishq Kumar

@tanishqkumar07

. 10. 28.

Please steal my AI research ideas. This is a list of research questions and concrete experiments I would love to see done, but don't have bandwidth to get to. If you are looking to break into AI research (e.g. as an undergraduate, or a software engineer in industry), these are…

Felix Nguyen 님이 재게시함

Elana Simon

@ElanaPearl

. 10. 23.

New blog post: The bug that taught me more about PyTorch than years of using it started with a simple training loss plateau... ended up digging through optimizer states, memory layouts, kernel dispatch, and finally understanding how PyTorch works!

ElanaPearl's tweet image. New blog post: The bug that taught me more about PyTorch than years of using it

started with a simple training loss plateau... ended up digging through optimizer states, memory layouts, kernel dispatch, and finally understanding how PyTorch works!

Felix Nguyen 님이 재게시함

Thinking Machines

@thinkymachines

. 10. 27.

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other…

thinkymachines's tweet image. Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other…

Felix Nguyen 님이 재게시함

λux

@novasarc01

. 10. 27.

one of the often overlooked factors behind deepseek’s success lies in how the team built their distributed infrastructure from the ground up (despite severe gpu constraints!). their custom communication library hfreduce replaced nccl and delivered substantially higher bandwidth…

novasarc01's tweet image. one of the often overlooked factors behind deepseek’s success lies in how the team built their distributed infrastructure from the ground up (despite severe gpu constraints!). their custom communication library hfreduce replaced nccl and delivered substantially higher bandwidth…

Felix Nguyen 님이 재게시함

Justin Skycak

@justinskycak

. 10. 25.

You know how if you spend the whole day sitting on the couch watching TV, you get kind of restless yet somehow also too tired to get off your butt? Like you're tired *of* doing nothing, yet you're also tired *from* doing nothing? You know what I'm talking about, the state of…

Felix Nguyen 님이 재게시함

Debasish (দেবাশিস্) Ghosh 🇮🇳

@debasishg

. 10. 26.

Some patterns on how iterators in Rust play well with the borrow checker - Iterator-friendly APIs, Borrow splitting and reborrowing .. Iterators prefer borrows Iterators often yield references to avoid allocation: Keep using an iterator after borrowing it Use `by_ref()` to…

debasishg's tweet image. Some patterns on how iterators in Rust play well with the borrow checker - Iterator-friendly APIs, Borrow splitting and reborrowing ..

Iterators prefer borrows
Iterators often yield references to avoid allocation:

Keep using an iterator after borrowing it
Use `by_ref()` to…

Felix Nguyen 님이 재게시함

Adithya S K

@adithya_s_k

. 10. 25.

I have fine-tuned over 100 different LLMs/VLMs for various use cases over the last 1–2 years, and here is my framework whenever I pick a new project or problem statement: 1. Benchmark/Evals Any problem you are solving for should have an evaluation set that you can easily…

Felix Nguyen 님이 재게시함

sahaj

@sahaj__b

. 10. 24.

What should I name this plugin?

0xforloop

@forloopcodes

. 10. 22.

can this be an extension in vscode, it just freezes your vscode randomly and plays phonk

Felix Nguyen 님이 재게시함

Eric Zhang

@ekzhang1

. 10. 24.

192 weeks notes.ekzhang.com/reflections/19…

ekzhang1's tweet card. “How many weeks are between Jan 31, 2022 and Oct 6, 2025?”

192 Weeks

출처: notes.ekzhang.com

Felix Nguyen 님이 재게시함

Andrej Karpathy

@karpathy

. 10. 24.

Last night I taught nanochat d32 how to count 'r' in strawberry (or similar variations). I thought this would be a good/fun example of how to add capabilities to nanochat and I wrote up a full guide here: github.com/karpathy/nanoc… This is done via a new synthetic task…

karpathy's tweet image. Last night I taught nanochat d32 how to count 'r' in strawberry (or similar variations). I thought this would be a good/fun example of how to add capabilities to nanochat and I wrote up a full guide here:
github.com/karpathy/nanoc…

This is done via a new synthetic task…

Felix Nguyen 님이 재게시함

George Grigorev

@iamgrigorev

. 10. 22.

full workflow with training & converting to hf tokenizers: github.com/thepowerfuldee…

iamgrigorev's tweet card. Training framework with a goal to explore the frontier of sample efficiency of small language models - thepowerfuldeez/sample_efficient_gpt

sample_efficient_gpt/sample_efficient_gpt/tokenizer/train.sh at main · thepowerfuldeez/sample_eff...

출처: github.com

Felix Nguyen 님이 재게시함

Andrej Karpathy

@karpathy

. 12. 9.

Of ~200 books I've read, the few that stayed with me over time and I find myself often thinking back to or referring to, in ~random order: All short stories by Ted Chiang, especially Exhalation, Division By Zero, Understand, The Story of Your Life, Liking What You See, The…

Felix Nguyen

@felixcantcode

. 10. 23.

me after watching karpathy x dwarkesh

Felix Nguyen 님이 재게시함

Jiaqi Ma

@Jiaqi_Ma_

. 10. 3.

There are two somewhat related myths about neural networks in many intro ML courses that, I think, mislead more than they help. 1) The statement, "neural networks are powerful," is often followed by the citation to universal approximation theorem. 2) Neural networks are often…

Andrej Karpathy

@karpathy

. 10. 1.

Finally had a chance to listen through this pod with Sutton, which was interesting and amusing. As background, Sutton's "The Bitter Lesson" has become a bit of biblical text in frontier LLM circles. Researchers routinely talk about and ask whether this or that approach or idea…

Felix Nguyen

@felixcantcode

. 10. 22.

i have been scaling law- and manifold hypothesis-pilled youtube.com/watch?v=5eqRuV…

felixcantcode's tweet card. AI can't cross this line and we don't know why.

youtube.com

YouTube

AI can't cross this line and we don't know why.

출처: youtube.com

Felix Nguyen 님이 재게시함

Andrej Karpathy

@karpathy

. 10. 21.

nanochat now has a primordial identity and can talk a bit about itself and its capabilities (e.g. it knows it's nanochat d32 that cost $800, that it was built by me, that it can't speak languages other than English too well and why, etc.). This kind of customization is all done…

Andrej Karpathy

@karpathy

. 10. 21.

I fixed it :) deployed live now. This was done by doing a round of synthetic data generation to collect a 1000 multi-turn conversations (given a bunch of information including the readme of the nanochat project), and then mixing that into midtraining and SFT. fun!

karpathy's tweet image. I fixed it :) deployed live now. This was done by doing a round of synthetic data generation to collect a 1000 multi-turn conversations (given a bunch of information including the readme of the nanochat project), and then mixing that into midtraining and SFT. fun!

Felix Nguyen 님이 재게시함

Debasish (দেবাশিস্) Ghosh 🇮🇳

@debasishg

. 10. 20.

In Rust, how the borrow-checker shapes your API inputs and outputs .. The general design principle is to choose signatures that minimize ownership churn while keeping call sites clean and safe. Accepting input • Borrow when you only read: `fn parse(src: &str)` • Borrow…

debasishg's tweet image. In Rust, how the borrow-checker shapes your API inputs and outputs ..

The general design principle is to choose signatures that minimize ownership churn while keeping call sites clean and safe.

Accepting input

• Borrow when you only read: `fn parse(src: &amp;str)`
• Borrow…

Felix Nguyen 님이 재게시함

the tiny corp

@__tinygrad__

. 10. 20.

NVIDIA over USB4 on MacBook is ready to try! * ADT-UT3G dock + any 30/40/50 series GPU * Disable SIP * Install driver `extra/usbgpu/tbgpu` * Install NVK compiler `brew install tinymesa` * Test with: `DEBUG=2 NV_NAK=1 NV=1 python3 test/test_tiny.py TestTiny.test_plus`

__tinygrad__'s tweet image. NVIDIA over USB4 on MacBook is ready to try!

* ADT-UT3G dock + any 30/40/50 series GPU
* Disable SIP
* Install driver `extra/usbgpu/tbgpu`
* Install NVK compiler `brew install tinymesa`
* Test with:
`DEBUG=2 NV_NAK=1 NV=1 python3 test/test_tiny.py TestTiny.test_plus`

Felix Nguyen 님이 재게시함

v

@iavins

. 10. 19.

Sharding. Database sharding is one of the common techniques to scale a database horizontally. You split the db into small parts called shards and distribute them across machines. Shards are typically in the few hundreds or even thousands (for extremely large databases). Usually…