Alexander H. Liu

@alex_h_liu

Ph.D. Student @MIT_CSAIL

Massachusetts, USA

alexander-h-liu.github.io

เข้าร่วมเมื่อ กรกฎาคม 2013

25โพสต์ 278ผู้ติดตาม 147กําลังติดตาม

คุณอาจชื่นชอบ

@YGongND

@PuyuanPeng

@YungSungChuang

@jefflai108

@HungyiLee2

@leibnyPaola

@dcml0714

@leo19941227

@lhyTHU

@jiatongshi

@Sid_Arora_18

@LianJiachen

@arouditchenko

@huckiyang

@ju_chieh

Alexander H. Liu รีโพสต์แล้ว

Soham

@sohamg121

18 ก.ค.

The Voxtral tech-report is up! arxiv.org/abs/2507.13264 We release these models with a permissive Apache 2.0 license. Feedback is welcome! We have a lot more cooking, this is just the beginning.

Alexander H. Liu รีโพสต์แล้ว

💡Bridging speech, sound, & music representations with one universal model? We introduce USAD ✅ 📚 Distills knowledge from domain-specific SSL models 🎯 Matches expert models across speech/audio/music tasks 📄 arxiv.org/abs/2506.18843 🧑‍💻 huggingface.co/MIT-SLS/USAD-B…

hjchang87's tweet image. 💡Bridging speech, sound, &amp; music representations with one universal model?

We introduce USAD ✅
📚 Distills knowledge from domain-specific SSL models
🎯 Matches expert models across speech/audio/music tasks

📄 arxiv.org/abs/2506.18843
🧑‍💻 huggingface.co/MIT-SLS/USAD-B…

Alexander H. Liu

@alex_h_liu

18 ธ.ค.

Highly recommended!!! (Happy to chat if you’re curious about the experience with the team)

Rafael Valle

@RafaelValleArt

17 ธ.ค.

Our team at NVIDIA is continuously looking for highly motivated interns to work on intelligence in audio understanding and synthesis. Please reach out if you would like to collaborate with us!

Alexander H. Liu

@alex_h_liu

10 ธ.ค.

Turns out speech self-supervised learning technique can be generalized to sign language! Great work led by @Shester_G (he’s looking for PhD opportunity this year!)

Shester Gueuwou

@Shester_G

4 ธ.ค.

Ever imagined a foundational model for sign language ?! Introducing SHuBERT(Sign Hidden Unit BERT)! With SHuBERT, we get SOTA results on ASL video understanding tasks compared to task-specific models from Google DeepMind, Meta, and Microsft, while using less compute ! 🧵 1/9

Alexander H. Liu รีโพสต์แล้ว

Rafael Valle

@RafaelValleArt

27 พ.ย.

💚 Big shoutout to the #FUGATTO team for making this release happen — and to cats like Coltrane and Xenakis, who envisioned a world where "saxophones bark and howl." Together, artists and researchers, let’s build a GPT-like future for audio generation! fugatto.github.io

Alexander H. Liu รีโพสต์แล้ว

Alan Baade

@BaadeAlan

9 ต.ค. 2024

Q: Why can't we get GPT-level understanding from language models on speech? A: We need better speech tokens! In SyllableLM, *we beat @kyutai_labs Moshi on semantic understanding in 70 hours of training* by making speech tokens at 5 frames/s With @PuyuanPeng, David Harwath 1/n

BaadeAlan's tweet image. Q: Why can't we get GPT-level understanding from language models on speech?
A: We need better speech tokens!

In SyllableLM, *we beat
@kyutai_labs Moshi on semantic understanding in 70 hours of training* by making speech tokens at 5 frames/s
With @PuyuanPeng, David Harwath

1/n

Alexander H. Liu รีโพสต์แล้ว

Rafael Valle

@RafaelValleArt

2 ก.ค. 2024

Synthetic labels are amazing! Do you need an audio labelling machine? Audio Flamingo checkpoints are available on github.com/NVIDIA/audio-f… ...and pre-training with synthetic labels from Audio Flamingo gives large improvements in text-to-audio models arxiv.org/abs/2406.15487

Alexander H. Liu

@alex_h_liu

16 เม.ย. 2024

Looking forward to meeting friends at #ICASSP2024

Alexander H. Liu รีโพสต์แล้ว

Rafael Valle

@RafaelValleArt

28 ก.พ. 2024

Beautiful work by Alex Liu on generative pre-training for speech with Flow Matching. I just realized it's one of the main components in AudioBox! arxiv.org/abs/2310.16338

Alexander H. Liu รีโพสต์แล้ว

Hung-yi Lee (李宏毅)

@HungyiLee2

25 ก.พ. 2024

Recent years have witnessed significant developments in audio codec models (an overview figure from arxiv.org/abs/2402.13236). We introduce Codec-SUPERB (arxiv.org/abs/2402.13071) to boost fair and comprehensive comparison. Leaderboard: codecsuperb.com

HungyiLee2's tweet image. Recent years have witnessed significant developments in audio codec models (an overview figure from arxiv.org/abs/2402.13236). We introduce Codec-SUPERB (arxiv.org/abs/2402.13071) to boost fair and comprehensive comparison. Leaderboard: codecsuperb.com

Alexander H. Liu

@alex_h_liu

9 ก.พ. 2024

Visiting @WavLab was OWSM

Alexander H. Liu

@alex_h_liu

19 ธ.ค. 2023

Lin-Shan: if no one asked you to attend the closing ceremony, you’re probably not getting the award (and laughed out loud)

Wen-Chin Huang

@unilightwf

18 ธ.ค. 2023

Prof. Lin-Shan Lee remembers all his students… amazing…

Alexander H. Liu รีโพสต์แล้ว

Yuan Gong

@YGongND

10 ธ.ค. 2023

LTU and LTU-AS codes are released. As usual, it is a full release including training and inference code, pretrained checkpoint, and the datasets. We hope these would be useful. Check github.com/YuanGongND/ltu.

YGongND's tweet card. Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand". - YuanGongND/ltu

GitHub - YuanGongND/ltu: Code, Dataset, and Pretrained Models for Audio and Speech Large Language...

แหล่งที่มา: github.com

Alexander H. Liu รีโพสต์แล้ว

Shinji Watanabe

@shinjiw_at_cmu

6 ธ.ค. 2023

I'll have a keynote talk at ASRU'23! asru2023.org/motion.asp?sit… See you soon in Taiwan! Actually, ASRU was the first conference that rejected my first-author paper (in 2003). But 20 years later, I was given the opportunity to be a keynote speaker, haha.

Alexander H. Liu รีโพสต์แล้ว

Shinji Watanabe

@shinjiw_at_cmu

2 พ.ย. 2023

We summarize our lab's activities toward speech foundation models at wavlab.org/activities/202…. We have several other ongoing activities, and they are selected papers presented at ASRU.

Alexander H. Liu รีโพสต์แล้ว

Yuan Gong

@YGongND

12 ส.ค. 2023

🚀 Our upgraded audio large language model LTU-2 is now hosted on HuggingFace Space at lnkd.in/eJDpsBY4. Please have a try and let us know what you think 😀 .

Alexander H. Liu รีโพสต์แล้ว

Andrew Rouditchenko 🇺🇦

@arouditchenko

23 พ.ค. 2023

🗣️ Whisper is great for speech recognition, but it only recognizes ~100 languages. What if it wasn't trained on the language that you speak? Happy to introduce my #INTERSPEECH2023 paper comparing Whisper and XLS-R for adaption to unseen languages! arxiv.org/abs/2305.12606