felixcantcode's profile picture. generating values for stakeholders, internet plumber, http://mu2mi.com, blog: https://felixng.me

opinions are of my own

Felix Nguyen

@felixcantcode

generating values for stakeholders, internet plumber, http://mu2mi.com, blog: https://felixng.me opinions are of my own

Felix Nguyen 님이 재게시함

Training LLMs end to end is hard. Very excited to share our new blog (book?) that cover the full pipeline: pre-training, post-training and infra. 200+ pages of what worked, what didn’t, and how to make it run reliably huggingface.co/spaces/Hugging…

eliebakouch's tweet image. Training LLMs end to end is hard. Very excited to share our new blog (book?) that cover the full pipeline: pre-training, post-training and infra. 200+ pages of what worked, what didn’t, and how to make it run reliably

huggingface.co/spaces/Hugging…

Felix Nguyen 님이 재게시함

Please steal my AI research ideas. This is a list of research questions and concrete experiments I would love to see done, but don't have bandwidth to get to. If you are looking to break into AI research (e.g. as an undergraduate, or a software engineer in industry), these are…


Felix Nguyen 님이 재게시함

New blog post: The bug that taught me more about PyTorch than years of using it started with a simple training loss plateau... ended up digging through optimizer states, memory layouts, kernel dispatch, and finally understanding how PyTorch works!

ElanaPearl's tweet image. New blog post: The bug that taught me more about PyTorch than years of using it

started with a simple training loss plateau... ended up digging through optimizer states, memory layouts, kernel dispatch, and finally understanding how PyTorch works!

Felix Nguyen 님이 재게시함

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other…

thinkymachines's tweet image. Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other…

Felix Nguyen 님이 재게시함

one of the often overlooked factors behind deepseek’s success lies in how the team built their distributed infrastructure from the ground up (despite severe gpu constraints!). their custom communication library hfreduce replaced nccl and delivered substantially higher bandwidth…

novasarc01's tweet image. one of the often overlooked factors behind deepseek’s success lies in how the team built their distributed infrastructure from the ground up (despite severe gpu constraints!). their custom communication library hfreduce replaced nccl and delivered substantially higher bandwidth…
novasarc01's tweet image. one of the often overlooked factors behind deepseek’s success lies in how the team built their distributed infrastructure from the ground up (despite severe gpu constraints!). their custom communication library hfreduce replaced nccl and delivered substantially higher bandwidth…
novasarc01's tweet image. one of the often overlooked factors behind deepseek’s success lies in how the team built their distributed infrastructure from the ground up (despite severe gpu constraints!). their custom communication library hfreduce replaced nccl and delivered substantially higher bandwidth…
novasarc01's tweet image. one of the often overlooked factors behind deepseek’s success lies in how the team built their distributed infrastructure from the ground up (despite severe gpu constraints!). their custom communication library hfreduce replaced nccl and delivered substantially higher bandwidth…

Felix Nguyen 님이 재게시함

You know how if you spend the whole day sitting on the couch watching TV, you get kind of restless yet somehow also too tired to get off your butt? Like you're tired *of* doing nothing, yet you're also tired *from* doing nothing? You know what I'm talking about, the state of…


Felix Nguyen 님이 재게시함

Some patterns on how iterators in Rust play well with the borrow checker - Iterator-friendly APIs, Borrow splitting and reborrowing .. Iterators prefer borrows Iterators often yield references to avoid allocation: Keep using an iterator after borrowing it Use `by_ref()` to…

debasishg's tweet image. Some patterns on how iterators in Rust play well with the borrow checker - Iterator-friendly APIs, Borrow splitting and reborrowing ..

Iterators prefer borrows
Iterators often yield references to avoid allocation:

Keep using an iterator after borrowing it
Use `by_ref()` to…
debasishg's tweet image. Some patterns on how iterators in Rust play well with the borrow checker - Iterator-friendly APIs, Borrow splitting and reborrowing ..

Iterators prefer borrows
Iterators often yield references to avoid allocation:

Keep using an iterator after borrowing it
Use `by_ref()` to…
debasishg's tweet image. Some patterns on how iterators in Rust play well with the borrow checker - Iterator-friendly APIs, Borrow splitting and reborrowing ..

Iterators prefer borrows
Iterators often yield references to avoid allocation:

Keep using an iterator after borrowing it
Use `by_ref()` to…

Felix Nguyen 님이 재게시함

I have fine-tuned over 100 different LLMs/VLMs for various use cases over the last 1–2 years, and here is my framework whenever I pick a new project or problem statement: 1. Benchmark/Evals Any problem you are solving for should have an evaluation set that you can easily…


Felix Nguyen 님이 재게시함

What should I name this plugin?

can this be an extension in vscode, it just freezes your vscode randomly and plays phonk



Felix Nguyen 님이 재게시함

Last night I taught nanochat d32 how to count 'r' in strawberry (or similar variations). I thought this would be a good/fun example of how to add capabilities to nanochat and I wrote up a full guide here: github.com/karpathy/nanoc… This is done via a new synthetic task…

karpathy's tweet image. Last night I taught nanochat d32 how to count 'r' in strawberry (or similar variations). I thought this would be a good/fun example of how to add capabilities to nanochat and I wrote up a full guide here:
github.com/karpathy/nanoc…

This is done via a new synthetic task…

Felix Nguyen 님이 재게시함

Of ~200 books I've read, the few that stayed with me over time and I find myself often thinking back to or referring to, in ~random order: All short stories by Ted Chiang, especially Exhalation, Division By Zero, Understand, The Story of Your Life, Liking What You See, The…


me after watching karpathy x dwarkesh

felixcantcode's tweet image. me after watching karpathy x dwarkesh

Felix Nguyen 님이 재게시함

There are two somewhat related myths about neural networks in many intro ML courses that, I think, mislead more than they help. 1) The statement, "neural networks are powerful," is often followed by the citation to universal approximation theorem. 2) Neural networks are often…

Finally had a chance to listen through this pod with Sutton, which was interesting and amusing. As background, Sutton's "The Bitter Lesson" has become a bit of biblical text in frontier LLM circles. Researchers routinely talk about and ask whether this or that approach or idea…



Felix Nguyen 님이 재게시함

nanochat now has a primordial identity and can talk a bit about itself and its capabilities (e.g. it knows it's nanochat d32 that cost $800, that it was built by me, that it can't speak languages other than English too well and why, etc.). This kind of customization is all done…

I fixed it :) deployed live now. This was done by doing a round of synthetic data generation to collect a 1000 multi-turn conversations (given a bunch of information including the readme of the nanochat project), and then mixing that into midtraining and SFT. fun!

karpathy's tweet image. I fixed it :) deployed live now. This was done by doing a round of synthetic data generation to collect a 1000 multi-turn conversations (given a bunch of information including the readme of the nanochat project), and then mixing that into midtraining and SFT. fun!


Felix Nguyen 님이 재게시함

In Rust, how the borrow-checker shapes your API inputs and outputs .. The general design principle is to choose signatures that minimize ownership churn while keeping call sites clean and safe. Accepting input • Borrow when you only read: `fn parse(src: &str)` • Borrow…

debasishg's tweet image. In Rust, how the borrow-checker shapes your API inputs and outputs ..

The general design principle is to choose signatures that minimize ownership churn while keeping call sites clean and safe.

Accepting input

• Borrow when you only read: `fn parse(src: &str)`
• Borrow…
debasishg's tweet image. In Rust, how the borrow-checker shapes your API inputs and outputs ..

The general design principle is to choose signatures that minimize ownership churn while keeping call sites clean and safe.

Accepting input

• Borrow when you only read: `fn parse(src: &str)`
• Borrow…

Felix Nguyen 님이 재게시함

NVIDIA over USB4 on MacBook is ready to try! * ADT-UT3G dock + any 30/40/50 series GPU * Disable SIP * Install driver `extra/usbgpu/tbgpu` * Install NVK compiler `brew install tinymesa` * Test with: `DEBUG=2 NV_NAK=1 NV=1 python3 test/test_tiny.py TestTiny.test_plus`

__tinygrad__'s tweet image. NVIDIA over USB4 on MacBook is ready to try!

* ADT-UT3G dock + any 30/40/50 series GPU
* Disable SIP
* Install driver `extra/usbgpu/tbgpu`
* Install NVK compiler `brew install tinymesa`
* Test with:
`DEBUG=2 NV_NAK=1 NV=1 python3 test/test_tiny.py TestTiny.test_plus`

Felix Nguyen 님이 재게시함

Sharding. Database sharding is one of the common techniques to scale a database horizontally. You split the db into small parts called shards and distribute them across machines. Shards are typically in the few hundreds or even thousands (for extremely large databases). Usually…


United States 트렌드

Loading...

Something went wrong.


Something went wrong.