mikecarroll_eng's profile picture. Engineer. Previously @Facebook

Mike Carroll

@mikecarroll_eng

Engineer. Previously @Facebook

Mike Carroll đã đăng lại

Training Andrej Karpathy’s Nanochat on 4x RTX 3090s at 225W each: Step 2,694/21,400 (12.59% done) Loss: 3.14 Runtime: 6.78 hours Throughput: 3,600 tok/sec Temps: 52-57°C VRAM: 19GB/24GB per card Total cost: 15$ at 55h Zero errors, perfectly stable

0x_Sero's tweet image. Training Andrej Karpathy’s Nanochat on 4x RTX 3090s at 225W each:

Step 2,694/21,400 (12.59% done)
Loss: 3.14
Runtime: 6.78 hours
Throughput: 3,600 tok/sec
Temps: 52-57°C
VRAM: 19GB/24GB per card
Total cost: 15$ at 55h

Zero errors, perfectly stable
0x_Sero's tweet image. Training Andrej Karpathy’s Nanochat on 4x RTX 3090s at 225W each:

Step 2,694/21,400 (12.59% done)
Loss: 3.14
Runtime: 6.78 hours
Throughput: 3,600 tok/sec
Temps: 52-57°C
VRAM: 19GB/24GB per card
Total cost: 15$ at 55h

Zero errors, perfectly stable

Mike Carroll đã đăng lại

I don't like courses. Most were a waste of time. Yes, even at Stanford. If you're new to ML, take CS231N.

your honor i object, i dont know about harvard but stanford literally releases SOTA courses

dejavucoder's tweet image. your honor i object, i dont know about harvard but stanford literally releases SOTA courses
dejavucoder's tweet image. your honor i object, i dont know about harvard but stanford literally releases SOTA courses


Mike Carroll đã đăng lại

My meeting budget: 5 min - meet someone new 10 min - solve a problem 15 min - identify + solve a problem Parkinson’s law: work expands so as to fill the time available for its completion.


Mike Carroll đã đăng lại

MIT's 6.851: Advanced Data Structures (Spring'21) courses.csail.mit.edu/6.851/spring21/ This has been on my recommendation list for a while, and the Memory hierarchy discussions are great in the context of cache-oblivious algorithms.

vivekgalatage's tweet image. MIT's 6.851: Advanced Data Structures (Spring'21)

courses.csail.mit.edu/6.851/spring21/

This has been on my recommendation list for a while, and the Memory hierarchy discussions are great in the context of cache-oblivious algorithms.

"Cache‑Oblivious Algorithms and Data Structures" by Erik D. Demaine erikdemaine.org/papers/BRICS20… This is a foundational survey on designing cache‑oblivious algorithms and data structures that perform as well as cache‑aware approaches that require hardcoding cache size (M) and block…

vivekgalatage's tweet image. "Cache‑Oblivious Algorithms and Data Structures" by Erik D. Demaine

erikdemaine.org/papers/BRICS20…

This is a foundational survey on designing cache‑oblivious algorithms and data structures that perform as well as cache‑aware approaches that require hardcoding cache size (M) and block…


Mike Carroll đã đăng lại

50 LLM Projects with Source Code to Become a Pro 1. Beginner-Level LLM Projects → Text Summarizer using OpenAI API → Chatbot for Customer Support → Sentiment Analysis with GPT Models → Resume Optimizer using LLMs → Product Description Generator → AI-Powered Grammar…

e_opore's tweet image. 50 LLM Projects with Source Code to Become a Pro

1. Beginner-Level LLM Projects

→ Text Summarizer using OpenAI API
→ Chatbot for Customer Support
→ Sentiment Analysis with GPT Models
→ Resume Optimizer using LLMs
→ Product Description Generator
→ AI-Powered Grammar…

Mike Carroll đã đăng lại

Mike Carroll đã đăng lại

I left my plans for weekend to read this recent blog from HuggingFace 🤗 on how they maintain the most critical AI library: transformers. → 1M lines of Python, → 1.3M installations, → thousands of contributors, → a true engineering masterpiece, Here's what I learned:…

Hesamation's tweet image. I left my plans for weekend to read this recent blog from HuggingFace 🤗 on how they maintain the most critical AI library: transformers.

→ 1M lines of Python,
→ 1.3M installations,
→ thousands of contributors,
→ a true engineering masterpiece, 

Here's what I learned:…

Mike Carroll đã đăng lại

Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,…

karpathy's tweet image. Excited to release new repo: nanochat!
(it's among the most unhinged I've written).

Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,…

Mike Carroll đã đăng lại

You're not depressed, you just lost your quest.

NTFabiano's tweet image. You're not depressed, you just lost your quest.

Mike Carroll đã đăng lại

🔥Free Google Collab notebooks to implement every Machine Learning Algorithm from scratch Link in comment

victor_explore's tweet image. 🔥Free Google Collab notebooks to implement every Machine Learning Algorithm from scratch Link in comment

Mike Carroll đã đăng lại

how i got here: > i used to be and still tend towards having an obsessive/addictive personality > put many years of my life into video games > it was only 2 years ago i started to turn that around because i got other interests and starting really looking forward to the future >…


Mike Carroll đã đăng lại

found a repo that has a massive collection of Machine Learning system design case studies used in the real world, from Stripe, Spotify, Netflix, Meta, GitHub, Twitter/X, and much more link in replies

d4rsh_tw's tweet image. found a repo that has a massive collection of Machine Learning system design case studies used in the real world, from Stripe, Spotify, Netflix, Meta, GitHub, Twitter/X, and much more

link in replies

Mike Carroll đã đăng lại

Copy-pasting PyTorch code is fast — using an AI coding model is even faster — but both skip the learning. That's why I asked my students to write by hand ✍️. 🔽 Download: byhand.ai/pytorch After the exercise, my students can understand what every line really does and…


Mike Carroll đã đăng lại

70 Python Projects with Source code for Developers Step 1: Beginner Foundations → Hello World Web App → Calculator (CLI) → To-Do List CLI → Number Guessing Game → Countdown Timer → Dice Roll Simulator → Coin Flip Simulator → Password Generator → Palindrome Checker →…


Mike Carroll đã đăng lại

everything you need to get started in one repo

elliotarledge's tweet image. everything you need to get started in one repo

Mike Carroll đã đăng lại

System prompts are getting outdated! Here's a counterintuitive lesson from building real-world Agents: Writing giant system prompts doesn't improve an Agent's performance; it often makes it worse. For example, you add a rule about refund policies. Then one about tone. Then…

akshay_pachaar's tweet image. System prompts are getting outdated!

Here's a counterintuitive lesson from building real-world Agents:

Writing giant system prompts doesn't improve an Agent's performance; it often makes it worse.

For example, you add a rule about refund policies. Then one about tone. Then…

Mike Carroll đã đăng lại

Nethack is the best benchmark you've never heard of. I'd say that it makes ALE look like a toy, but well... it is

Introducing Scalable Option Learning (SOL☀️), a blazingly fast hierarchical RL algorithm that makes progress on long-horizon tasks and demonstrates positive scaling trends on the largely unsolved NetHack benchmark, when trained for 30 billion samples. Details, paper and code in >



Mike Carroll đã đăng lại

it's insane to me how little attention the llm.q repo has it's a fully C/C++/CUDA implementation of multi-gpu (zero + fsdp), quantized LLM training with support for selective AC it's genuinely the coolest OSS thing I've seen this year (what's crazier is 1 person wrote it!)

a1zhang's tweet image. it's insane to me how little attention the llm.q repo has

it's a fully C/C++/CUDA implementation of multi-gpu (zero + fsdp), quantized LLM training with support for selective AC

it's genuinely the coolest OSS thing I've seen this year (what's crazier is 1 person wrote it!)

Mike Carroll đã đăng lại

Finally had a chance to listen through this pod with Sutton, which was interesting and amusing. As background, Sutton's "The Bitter Lesson" has become a bit of biblical text in frontier LLM circles. Researchers routinely talk about and ask whether this or that approach or idea…

.@RichardSSutton, father of reinforcement learning, doesn’t think LLMs are bitter-lesson-pilled. My steel man of Richard’s position: we need some new architecture to enable continual (on-the-job) learning. And if we have continual learning, we don't need a special training…



As the divide between the super-rich and the rest widens, this strategy becomes increasingly relevant.


Loading...

Something went wrong.


Something went wrong.