Duc Anh Nguyen

@duc_anh2k2

AI Resident @ Qualcomm (Ex VinAI Research) #MachineLearning #Mixtureofexperts #Mamba #PEFT

Joined May 2024

44Posts 1Followers 122Following

Duc Anh Nguyen reposted

Mathieu

@miniapeur

Nov 6

Alex is currently on the faculty job market! He is an incredible researcher! In fact, my interest in Gaussian processes was sparked by his work.

Alexander Terenin - on the faculty job market

@avt_im

Oct 26

Today, I gave a talk at the INFORMS Job Market Showcase! If you're interested, here are the slides - link below!

Duc Anh Nguyen reposted

Sanjeev Arora

@prfsanjeevarora

Sep 19

Nice cheat sheet for LLM terminology!

- you are - a random CS grad with 0 clue how LLMs work - get tired of people gatekeeping with big words and tiny GPUs - decide to go full monk mode - 2 years later i can explain attention mechanisms at parties and ruin them - here’s the forbidden knowledge map - top to bottom,…

Duc Anh Nguyen reposted

Sebastian Raschka

@rasbt

Aug 17

Couldn't resist. Here's a pure PyTorch from-scratch re-implementation of Gemma 3 270M in a Jupyter Notebook (uses about 1.49 GB RAM): github.com/rasbt/LLMs-fro…

rasbt's tweet image. Couldn't resist.
Here's a pure PyTorch from-scratch re-implementation of Gemma 3 270M in a Jupyter Notebook (uses about 1.49 GB RAM): github.com/rasbt/LLMs-fro…

Sebastian Raschka

@rasbt

Aug 14

Gemma 3 270M! Great to see another awesome, small open-weight LLM for local tinkering. Here's a side-by-side comparison with Qwen3. Biggest surprise that it only has 4 attention heads!

rasbt's tweet image. Gemma 3 270M! Great to see another awesome, small open-weight LLM for local tinkering.
Here's a side-by-side comparison with Qwen3. Biggest surprise that it only has 4 attention heads!

Duc Anh Nguyen reposted

Rohan Paul

@rohanpaul_ai

Aug 14

🎯 Andrej Karpathy on how to learn.

Duc Anh Nguyen reposted

Graham Neubig

@gneubig

Aug 5

Summary of GPT-OSS architectural innovations: 1. sliding window attention (ref: arxiv.org/abs/1901.02860) 2. mixture of experts (ref: arxiv.org/abs/2101.03961) 3. RoPE w/ Yarn (ref: arxiv.org/abs/2309.00071) 4. attention sinks (ref: streaming llm arxiv.org/abs/2309.17453)

Duc Anh Nguyen reposted

Alex Prompter

@alex_prompter

Jul 26

what are large language models actually doing? i read the 2025 textbook "Foundations of Large Language Models" by tong xiao and jingbo zhu and for the first time, i truly understood how they work. here’s everything you need to know about llms in 3 minutes↓

alex_prompter's tweet image. what are large language models actually doing?

i read the 2025 textbook "Foundations of Large Language Models" by tong xiao and jingbo zhu and for the first time, i truly understood how they work.

here’s everything you need to know about llms in 3 minutes↓

Duc Anh Nguyen reposted

Lorenzo Xiao

@lrzneedresearch

Jul 12

As promised, my SOP draft is here: algoroxyolo.github.io/assets/pdf/lrz… Please lmk if you have any suggestions or you have any recommendations where you think I should apply or what I should do in my future research. As always RT appreciated!! #PhDApplication #NLP #HCI

Duc Anh Nguyen reposted

Ernest Ryu

@ErnestRyu

Jul 10

New lecture recordings on RL+LLM! 📺 This spring, I gave a lecture series titled **Reinforcement Learning of Large Language Models**. I have decided to re-record these lectures and share them on YouTube. (1/7)

Duc Anh Nguyen reposted

Ricardo Buitrago

@rbuit_

Jul 7

Despite theoretically handling long contexts, existing recurrent models still fall short: they may fail to generalize past the training length. We show a simple and general fix which enables length generalization in up to 256k sequences, with no need to change the architectures!

rbuit_'s tweet image. Despite theoretically handling long contexts, existing recurrent models still fall short: they may fail to generalize past the training length. We show a simple and general fix which enables length generalization in up to 256k sequences, with no need to change the architectures!

Duc Anh Nguyen reposted

Zhengfu He

@ZhengfuHe

Apr 30

Are attention heads the right units to mechanistically understand Transformers' attention behavior? Probably not due the attention superposition! We extracted interpretable attention units in LMs and found finer grained versions of many known and novel attention behaviors. 🧵1/N

ZhengfuHe's tweet image. Are attention heads the right units to mechanistically understand Transformers' attention behavior? Probably not due the attention superposition!

We extracted interpretable attention units in LMs and found finer grained versions of many known and novel attention behaviors.
🧵1/N

Duc Anh Nguyen reposted

Probability and Statistics

@probnstat

Apr 15

Statistical Learning Theory by Percy Liang web.stanford.edu/class/cs229t/n…

Duc Anh Nguyen reposted

Zhuang Liu

@liuzhuang1234

Mar 14

New paper - Transformers, but without normalization layers (1/n)

Duc Anh Nguyen reposted

אגי-e/acc

@murage_kibicho

Jan 5

I've been reading this book alongside Deepseek. The math is mathing. The code is coding. The Deepseek is deepseeking! @deepseek_ai you made god!

murage_kibicho's tweet image. I've been reading this book alongside Deepseek.
The math is mathing.
The code is coding.
The Deepseek is deepseeking!
@deepseek_ai you made god!

Duc Anh Nguyen reposted

Math Cafe

@Riazi_Cafe_en

Nov 27

Stanford “Statistics and Information Theory” lecture notes PDF: web.stanford.edu/class/stats311…

Math Cafe

@Riazi_Cafe_en

Nov 15, 2024

Stanford "Stochastic Processes" lecture notes PDF: adembo.su.domains/math-136/nnote…

Duc Anh Nguyen reposted

Harrison Ritz

@harrison_ritz

Oct 14, 2024

Excited to share a new project! 🎉🎉 doi.org/10.1101/2024.0… How do we navigate between brain states when we switch tasks? Are dynamics driven by control, or passive decay of the prev task? To answer, we compare high-dim linear dynamical systems fit to EEG and RNNs🌀 ⏬

Duc Anh Nguyen reposted

Abhinav Shukla

@Abhinav95_

Oct 10, 2024

Announcing MatMamba - an elastic Mamba2🐍architecture with🪆Matryoshka-style training and adaptive inference. Train a single elastic model, get 100s of nested submodels for free! Paper: sca.fo/mmpaper Code: sca.fo/mmcode 🧵(1/10)

Abhinav95_'s tweet image. Announcing MatMamba - an elastic Mamba2🐍architecture with🪆Matryoshka-style training and adaptive inference.

Train a single elastic model, get 100s of nested submodels for free!

Paper: sca.fo/mmpaper
Code: sca.fo/mmcode
🧵(1/10)

Duc Anh Nguyen reposted

Rohan Paul

@rohanpaul_ai

Sep 16, 2024

A cool Github repo collecting LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.

Duc Anh Nguyen reposted

Khushi Agrawal

@khushi__411

Sep 13, 2024

Excited to share a blog series I've been working on, diving deep into CUDA programming! Inspired by the #PMPP book & #CUDA_MODE!! Check out the links below...

khushi__411's tweet image. Excited to share a blog series I've been working on, diving deep into CUDA programming! Inspired by the #PMPP book &amp; #CUDA_MODE!!

Check out the links below...

Duc Anh Nguyen reposted

Tom Yeh

@ProfTomYeh

Jul 27, 2024

[VAE] by Hand ✍️ A Variational Auto Encoder (VAE) learns the structure (mean and variance) of hidden features and generates new data from the learned structure. In contrast, GANs only learn to generate new data to fool a discriminator; they may not necessarily know the…

Duc Anh Nguyen reposted

AK

@_akhaliq

Jul 22, 2024

EVLM An Efficient Vision-Language Model for Visual Understanding In the field of multi-modal language models, the majority of methods are built on an architecture similar to LLaVA. These models use a single-layer ViT feature as a visual prompt, directly feeding it into the