gary_doesnt_lai's profile picture.

Gary’s Notebook 🧙‍♂️

@gary_doesnt_lai

Tip for debugging with LLM: instead of asking it to just stare at the code to guess what’s wrong, ask it to create a script to print out everything you need to diagnose the issue and then give it the output. Suddenly your LLM becomes 10x better at debugging.


One consequence of much faster internet may be that LLMs will move more and more to the frontend (browser) — better privacy, UX and no serving cost. WebGPU is already working pretty well and one major bottleneck (other than storage) is speed to download these giant models.


Been contemplating the fact that in LLMs, the outlier values are often the most important values. If you remove these outliers, your model performance degrades drastically. It's quite poetic and affirms my belief that "the average is overrated".


One cool thing from Gemini’s technical report no one is talking about is that it can take audio signal natively (as opposed to converting audio to text). This means it can potentially capture speech tones?


LLMs are to small deep learning models what small deep learning models are to hard-code logic


Gary’s Notebook 🧙‍♂️ heeft deze post opnieuw geplaatst

Is it possible to solve NLP tasks by simply following instructions that define the tasks? How can we measure the progress? Excited to announce Natural Instructions v2, a collection of 1600+ diverse language tasks and their expert-written instructions! 📜arxiv.org/abs/2204.07705


Gary’s Notebook 🧙‍♂️ heeft deze post opnieuw geplaatst

Everybody wants their models to run faster. However, researchers often cargo cult performance without a solid understanding on the underlying principles. To address that, I wrote a post called "Making Deep Learning Go Brrrr From First Principles". (1/3) horace.io/brrr_intro.html

cHHillee's tweet image. Everybody wants their models to run faster. However, researchers often cargo cult performance without a solid understanding on the underlying principles.

To address that, I wrote a post called "Making Deep Learning Go Brrrr From First Principles". (1/3)

horace.io/brrr_intro.html

Was wondering why these model checkpoints files are so big (~GB). Aren’t they just a bunch of floats (~4 bytes each)? Then realized roberta-large is 355M parameters 🤯🤯🤯


Gary’s Notebook 🧙‍♂️ heeft deze post opnieuw geplaatst

We need a dedicated collection of Toy Datasets for Machine Learning: 1. They can be more interesting than real datasets, specially if designed to be hard for certain algorithms. 2. They are more useful for teaching / learning. Maybe @huggingface / @kaggle can help with this?


Gary’s Notebook 🧙‍♂️ heeft deze post opnieuw geplaatst

Got hit-and-run by a white work van on Atlantic and Hellman on 9/24, 7:21pm. Neck is a bit sore but otherwise I'm ok. I'm offering $1k cash or $2k to charity of your choice for first person to send me dash cam footage clearly showing the van's license plate (ends in 5G)

ericjang11's tweet image. Got hit-and-run by a white work van on Atlantic and Hellman on 9/24, 7:21pm. Neck is a bit sore but otherwise I'm ok. I'm offering $1k cash or $2k to charity of your choice for first person to send me dash cam footage clearly showing the van's license plate (ends in 5G)

Gary’s Notebook 🧙‍♂️ heeft deze post opnieuw geplaatst

Stanford's ~entire AI Department has just released a 200 page 100 author Neural Scaling Laws Manifesto. They're pivoting to positioning themselves as #1 at academic ML Scaling (e.g. GPT-4) research. "On the Opportunities and Risks of Foundation Models" arxiv.org/abs/2108.07258

ethanCaballero's tweet image. Stanford's ~entire AI Department has just released a 200 page 100 author Neural Scaling Laws Manifesto.

They're pivoting to positioning themselves as #1 at academic ML Scaling (e.g. GPT-4) research.

"On the Opportunities and Risks of Foundation Models"
arxiv.org/abs/2108.07258

Loading...

Something went wrong.


Something went wrong.