duc_anh2k2's profile picture. AI Resident @ Qualcomm (Ex VinAI Research) #MachineLearning #Mixtureofexperts #Mamba #PEFT

Duc Anh Nguyen

@duc_anh2k2

AI Resident @ Qualcomm (Ex VinAI Research) #MachineLearning #Mixtureofexperts #Mamba #PEFT

Duc Anh Nguyen reposted

Alex is currently on the faculty job market! He is an incredible researcher! In fact, my interest in Gaussian processes was sparked by his work.

Today, I gave a talk at the INFORMS Job Market Showcase! If you're interested, here are the slides - link below!

avt_im's tweet image. Today, I gave a talk at the INFORMS Job Market Showcase!

If you're interested, here are the slides - link below!


Duc Anh Nguyen reposted

Nice cheat sheet for LLM terminology!

- you are - a random CS grad with 0 clue how LLMs work - get tired of people gatekeeping with big words and tiny GPUs - decide to go full monk mode - 2 years later i can explain attention mechanisms at parties and ruin them - here’s the forbidden knowledge map - top to bottom,…



Duc Anh Nguyen reposted

Couldn't resist. Here's a pure PyTorch from-scratch re-implementation of Gemma 3 270M in a Jupyter Notebook (uses about 1.49 GB RAM): github.com/rasbt/LLMs-fro…

rasbt's tweet image. Couldn't resist. 
Here's a pure PyTorch from-scratch re-implementation of Gemma 3 270M in a Jupyter Notebook (uses about 1.49 GB RAM): github.com/rasbt/LLMs-fro…

Gemma 3 270M! Great to see another awesome, small open-weight LLM for local tinkering. Here's a side-by-side comparison with Qwen3. Biggest surprise that it only has 4 attention heads!

rasbt's tweet image. Gemma 3 270M! Great to see another awesome, small open-weight LLM for local tinkering.
Here's a side-by-side comparison with Qwen3. Biggest surprise that it only has 4 attention heads!


Duc Anh Nguyen reposted

🎯 Andrej Karpathy on how to learn.

rohanpaul_ai's tweet image. 🎯 Andrej Karpathy on how to learn.

Duc Anh Nguyen reposted

Summary of GPT-OSS architectural innovations: 1. sliding window attention (ref: arxiv.org/abs/1901.02860) 2. mixture of experts (ref: arxiv.org/abs/2101.03961) 3. RoPE w/ Yarn (ref: arxiv.org/abs/2309.00071) 4. attention sinks (ref: streaming llm arxiv.org/abs/2309.17453)


Duc Anh Nguyen reposted

what are large language models actually doing? i read the 2025 textbook "Foundations of Large Language Models" by tong xiao and jingbo zhu and for the first time, i truly understood how they work. here’s everything you need to know about llms in 3 minutes↓

alex_prompter's tweet image. what are large language models actually doing?

i read the 2025 textbook "Foundations of Large Language Models" by tong xiao and jingbo zhu and for the first time, i truly understood how they work.

here’s everything you need to know about llms in 3 minutes↓

Duc Anh Nguyen reposted

As promised, my SOP draft is here: algoroxyolo.github.io/assets/pdf/lrz… Please lmk if you have any suggestions or you have any recommendations where you think I should apply or what I should do in my future research. As always RT appreciated!! #PhDApplication #NLP #HCI


Duc Anh Nguyen reposted

New lecture recordings on RL+LLM! 📺 This spring, I gave a lecture series titled **Reinforcement Learning of Large Language Models**. I have decided to re-record these lectures and share them on YouTube. (1/7)


Duc Anh Nguyen reposted

Despite theoretically handling long contexts, existing recurrent models still fall short: they may fail to generalize past the training length. We show a simple and general fix which enables length generalization in up to 256k sequences, with no need to change the architectures!

rbuit_'s tweet image. Despite theoretically handling long contexts, existing recurrent models still fall short: they may fail to generalize past the training length. We show a simple and general fix which enables length generalization in up to 256k sequences, with no need to change the architectures!

Duc Anh Nguyen reposted

Are attention heads the right units to mechanistically understand Transformers' attention behavior? Probably not due the attention superposition! We extracted interpretable attention units in LMs and found finer grained versions of many known and novel attention behaviors. 🧵1/N

ZhengfuHe's tweet image. Are attention heads the right units to mechanistically understand Transformers' attention behavior? Probably not due the attention superposition!

We extracted interpretable attention units in LMs and found finer grained versions of many known and novel attention behaviors.
🧵1/N

Duc Anh Nguyen reposted

Statistical Learning Theory by Percy Liang web.stanford.edu/class/cs229t/n…

probnstat's tweet image. Statistical Learning Theory by Percy Liang

web.stanford.edu/class/cs229t/n…

Duc Anh Nguyen reposted

New paper - Transformers, but without normalization layers (1/n)

liuzhuang1234's tweet image. New paper - 

Transformers, but without normalization layers (1/n)

Duc Anh Nguyen reposted

I've been reading this book alongside Deepseek. The math is mathing. The code is coding. The Deepseek is deepseeking! @deepseek_ai you made god!

murage_kibicho's tweet image. I've been reading this book alongside Deepseek.
The math is mathing.
The code is coding.
The Deepseek is deepseeking!
@deepseek_ai you made god!

Duc Anh Nguyen reposted

Stanford “Statistics and Information Theory” lecture notes PDF: web.stanford.edu/class/stats311…

Riazi_Cafe_en's tweet image. Stanford “Statistics and Information Theory” lecture notes 

PDF: web.stanford.edu/class/stats311…

Stanford "Stochastic Processes" lecture notes PDF: adembo.su.domains/math-136/nnote…

Riazi_Cafe_en's tweet image. Stanford "Stochastic Processes" lecture notes

PDF: adembo.su.domains/math-136/nnote…


Duc Anh Nguyen reposted

Excited to share a new project! 🎉🎉 doi.org/10.1101/2024.0… How do we navigate between brain states when we switch tasks? Are dynamics driven by control, or passive decay of the prev task? To answer, we compare high-dim linear dynamical systems fit to EEG and RNNs🌀 ⏬


Duc Anh Nguyen reposted

Announcing MatMamba - an elastic Mamba2🐍architecture with🪆Matryoshka-style training and adaptive inference. Train a single elastic model, get 100s of nested submodels for free! Paper: sca.fo/mmpaper Code: sca.fo/mmcode 🧵(1/10)

Abhinav95_'s tweet image. Announcing MatMamba - an elastic Mamba2🐍architecture with🪆Matryoshka-style training and adaptive inference.

Train a single elastic model, get 100s of nested submodels for free!

Paper: sca.fo/mmpaper
Code: sca.fo/mmcode
🧵(1/10)

Duc Anh Nguyen reposted

A cool Github repo collecting LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.

rohanpaul_ai's tweet image. A cool Github repo collecting LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.

Duc Anh Nguyen reposted

Excited to share a blog series I've been working on, diving deep into CUDA programming! Inspired by the #PMPP book & #CUDA_MODE!! Check out the links below...

khushi__411's tweet image. Excited to share a blog series I've been working on, diving deep into CUDA programming! Inspired by the #PMPP book & #CUDA_MODE!!

Check out the links below...

Duc Anh Nguyen reposted

[VAE] by Hand ✍️ A Variational Auto Encoder (VAE) learns the structure (mean and variance) of hidden features and generates new data from the learned structure. In contrast, GANs only learn to generate new data to fool a discriminator; they may not necessarily know the…


Duc Anh Nguyen reposted

EVLM An Efficient Vision-Language Model for Visual Understanding In the field of multi-modal language models, the majority of methods are built on an architecture similar to LLaVA. These models use a single-layer ViT feature as a visual prompt, directly feeding it into the

_akhaliq's tweet image. EVLM

An Efficient Vision-Language Model for Visual Understanding

In the field of multi-modal language models, the majority of methods are built on an architecture similar to LLaVA. These models use a single-layer ViT feature as a visual prompt, directly feeding it into the

United States Trends

Loading...

Something went wrong.


Something went wrong.