Felix Nguyen
@felixcantcode
generating values for stakeholders, internet plumber, http://mu2mi.com, blog: https://felixng.me opinions are of my own
New blog post: The bug that taught me more about PyTorch than years of using it started with a simple training loss plateau... ended up digging through optimizer states, memory layouts, kernel dispatch, and finally understanding how PyTorch works!
Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other…
one of the often overlooked factors behind deepseek’s success lies in how the team built their distributed infrastructure from the ground up (despite severe gpu constraints!). their custom communication library hfreduce replaced nccl and delivered substantially higher bandwidth…
You know how if you spend the whole day sitting on the couch watching TV, you get kind of restless yet somehow also too tired to get off your butt? Like you're tired *of* doing nothing, yet you're also tired *from* doing nothing? You know what I'm talking about, the state of…
Some patterns on how iterators in Rust play well with the borrow checker - Iterator-friendly APIs, Borrow splitting and reborrowing .. Iterators prefer borrows Iterators often yield references to avoid allocation: Keep using an iterator after borrowing it Use `by_ref()` to…
I have fine-tuned over 100 different LLMs/VLMs for various use cases over the last 1–2 years, and here is my framework whenever I pick a new project or problem statement: 1. Benchmark/Evals Any problem you are solving for should have an evaluation set that you can easily…
What should I name this plugin?
can this be an extension in vscode, it just freezes your vscode randomly and plays phonk
Last night I taught nanochat d32 how to count 'r' in strawberry (or similar variations). I thought this would be a good/fun example of how to add capabilities to nanochat and I wrote up a full guide here: github.com/karpathy/nanoc… This is done via a new synthetic task…
full workflow with training & converting to hf tokenizers: github.com/thepowerfuldee…
Of ~200 books I've read, the few that stayed with me over time and I find myself often thinking back to or referring to, in ~random order: All short stories by Ted Chiang, especially Exhalation, Division By Zero, Understand, The Story of Your Life, Liking What You See, The…
There are two somewhat related myths about neural networks in many intro ML courses that, I think, mislead more than they help. 1) The statement, "neural networks are powerful," is often followed by the citation to universal approximation theorem. 2) Neural networks are often…
Finally had a chance to listen through this pod with Sutton, which was interesting and amusing. As background, Sutton's "The Bitter Lesson" has become a bit of biblical text in frontier LLM circles. Researchers routinely talk about and ask whether this or that approach or idea…
i have been scaling law- and manifold hypothesis-pilled youtube.com/watch?v=5eqRuV…
youtube.com
YouTube
AI can't cross this line and we don't know why.
nanochat now has a primordial identity and can talk a bit about itself and its capabilities (e.g. it knows it's nanochat d32 that cost $800, that it was built by me, that it can't speak languages other than English too well and why, etc.). This kind of customization is all done…
I fixed it :) deployed live now. This was done by doing a round of synthetic data generation to collect a 1000 multi-turn conversations (given a bunch of information including the readme of the nanochat project), and then mixing that into midtraining and SFT. fun!
In Rust, how the borrow-checker shapes your API inputs and outputs .. The general design principle is to choose signatures that minimize ownership churn while keeping call sites clean and safe. Accepting input • Borrow when you only read: `fn parse(src: &str)` • Borrow…
NVIDIA over USB4 on MacBook is ready to try! * ADT-UT3G dock + any 30/40/50 series GPU * Disable SIP * Install driver `extra/usbgpu/tbgpu` * Install NVK compiler `brew install tinymesa` * Test with: `DEBUG=2 NV_NAK=1 NV=1 python3 test/test_tiny.py TestTiny.test_plus`
Sharding. Database sharding is one of the common techniques to scale a database horizontally. You split the db into small parts called shards and distribute them across machines. Shards are typically in the few hundreds or even thousands (for extremely large databases). Usually…
(1/2) i felt like no one actually teaches you a good framework for how to read (ML) papers well + fast, so i wrote this 5-minute read tldr: because so many papers suck, here's how to go through them quickly and revisit the good ones
I just published the full guide to building forms with the Field component: - TanStack Form & React Hook Form - Zod validation and displaying errors - Practical examples we’ll actually use - Inputs, Radios, Fieldset, Arrays & more Check it out. Link below.
United States Trends
- 1. SNAP 1.02M posts
- 2. Jamaica 247K posts
- 3. $NVDA 86.7K posts
- 4. Don Lemon 3,463 posts
- 5. Nelson 29.2K posts
- 6. Tucker 98.5K posts
- 7. Amare 2,343 posts
- 8. Hurricane Melissa 179K posts
- 9. Wikipedia 117K posts
- 10. #NationalFirstRespondersDay 1,437 posts
- 11. New Hope 35K posts
- 12. Fuentes 78.4K posts
- 13. Nokia 14.9K posts
- 14. Western Union 5,384 posts
- 15. Jensen 10.9K posts
- 16. Riley Gaines 105K posts
- 17. Ben Shelton N/A
- 18. Carlton 1,628 posts
- 19. Rattler 4,107 posts
- 20. Grokipedia 183K posts
Something went wrong.
Something went wrong.