Weiting (Steven) Tan

@weiting_nlp

Ph.D. Candidate at @jhuclsp, Student Researcher @Bytedance Seed | Prev @AIatMeta @Amazon Alexa AI

USA

steventan0110.github.io

7月 2021に登録

79ポスト 210フォロワー 306フォロー中

おすすめツイート

@n_verma1

@kesnet50

$amuuueller's profile picture. Asst. Prof. in CS at @BU_Tweets ≡ {Mechanistic, causal} {interpretability, NLP}$

@amuuueller

@boyuan__zheng

@StellaLisy

@PaltaShramay

@ChengleiSi

@Alexir563

@suzyahyah

@ruyimarone

@maieuticlab

@EliasEskin

@ZiyangW00

@VijayTiyyala

@abe_hou

固定されたツイート

Weiting (Steven) Tan

@weiting_nlp

/09/19

I was curious about the voice agent, particularly its ability for tool-use and reasoning, while listening to a user. So we train voice agents by building a sandbox (based on tau-bench) that uses GPT-4.1 for user simulation and SEED-TTS for speech synthesis. #Agents #ToolUse

weiting_nlp's tweet image. I was curious about the voice agent, particularly its ability for tool-use and reasoning, while listening to a user.

So we train voice agents by building a sandbox (based on tau-bench) that uses GPT-4.1 for user simulation and SEED-TTS for speech synthesis.
#Agents #ToolUse

Weiting (Steven) Tan さんがリポスト

Hirofumi Inaguma

@HirofumiInaguma

/10/24

I contributed to SeamlessM4T and Seamless Interaction at Meta. I’ve worked on ASR, speech translation, TTS, full-duplex speech LLM, audiovisual, and human motion. Especially, I have expertise in realtime streaming modeling. scholar.google.com/citations?user…

Weiting (Steven) Tan さんがリポスト

Jubayer Ibn Hamid

@jubayer_hamid

/10/01

Exploration is fundamental to RL. Yet policy gradient methods often collapse: during training they fail to explore broadly, and converge into narrow, easily exploitable behaviors. The result is poor generalization, limited gains from test-time scaling, and brittleness on tasks…

jubayer_hamid's tweet image. Exploration is fundamental to RL. Yet policy gradient methods often collapse: during training they fail to explore broadly, and converge into narrow, easily exploitable behaviors. The result is poor generalization, limited gains from test-time scaling, and brittleness on tasks…

Weiting (Steven) Tan さんがリポスト

Sanxing Chen

@sanxing_chen

/09/30

Most RL for LLMs today is single-step optimization on a given state (e.g., an instruction), which is essentially a bandit setup. But to learn a meta-policy that can solve various bandit problems via in-context trial and error, you need true multi-turn RL over a long horizon. So,…

sanxing_chen's tweet image. Most RL for LLMs today is single-step optimization on a given state (e.g., an instruction), which is essentially a bandit setup. But to learn a meta-policy that can solve various bandit problems via in-context trial and error, you need true multi-turn RL over a long horizon. So,…

Weiting (Steven) Tan さんがリポスト

Jacob Austin

@jacobaustin132

/02/04

Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the “systems view” of LLMs and wrote a little textbook called “How To Scale Your Model” which we’re releasing today. 1/n

jacobaustin132's tweet image. Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the “systems view” of LLMs and wrote a little textbook called “How To Scale Your Model” which we’re releasing today. 1/n

Weiting (Steven) Tan さんがリポスト

Dongfu Jiang

@DongfuJiang

/09/03

🚀 Excited to finally share our paper on VerlTool, released today after months of work since the initial release in late May! VerlTool is a high-efficiency, easy-to-use framework for Agentic RL with Tool use (ARLT), built on top of VeRL. It currently supports a wide range of…

Dongfu Jiang

@DongfuJiang

/06/01

Introducing VerlTool - a unified and easy-to-extend tool agent training framework based on verl. Recently, there's been a growing trend toward training tool agents with reinforcement learning algorithms like GRPO and PPO. Representative works include SearchR1, ToRL, ReTool, and…

DongfuJiang's tweet image. Introducing VerlTool - a unified and easy-to-extend tool agent training framework based on verl.

Recently, there's been a growing trend toward training tool agents with reinforcement learning algorithms like GRPO and PPO. Representative works include SearchR1, ToRL, ReTool, and…

Weiting (Steven) Tan さんがリポスト

Jason Weston

@jaseweston

/09/03

🌀Diversity Aware RL (DARLING)🌀 📝: arxiv.org/abs/2509.02534 - Jointly optimizes for quality & diversity using a learned partition function - Outperforms standard RL in quality AND diversity metrics, e.g. higher pass@1/p@k - Works for both non-verifiable & verifiable tasks 🧵1/5

jaseweston's tweet image. 🌀Diversity Aware RL (DARLING)🌀
📝: arxiv.org/abs/2509.02534
- Jointly optimizes for quality &amp; diversity using a learned partition function
- Outperforms standard RL in quality AND diversity metrics, e.g. higher pass@1/p@k
- Works for both non-verifiable &amp; verifiable tasks
🧵1/5

Weiting (Steven) Tan さんがリポスト

Benjamin Van Durme

@ben_vandurme

/03/17

Our latest on compressed representations: Key-Value Distillation (KVD). Query-independen transformer compression, with offline supervised distillation.

ben_vandurme's tweet image. Our latest on compressed representations: Key-Value Distillation (KVD). Query-independen transformer compression, with offline supervised distillation.

Weiting (Steven) Tan さんがリポスト

DeepSeek

@deepseek_ai

/01/20

🛠️ DeepSeek-R1: Technical Highlights 📈 Large-scale RL in post-training 🏆 Significant performance boost with minimal labeled data 🔢 Math, code, and reasoning tasks on par with OpenAI-o1 📄 More details: github.com/deepseek-ai/De… 🐋 4/n

deepseek_ai's tweet image. 🛠️ DeepSeek-R1: Technical Highlights

📈 Large-scale RL in post-training
🏆 Significant performance boost with minimal labeled data
🔢 Math, code, and reasoning tasks on par with OpenAI-o1
📄 More details: github.com/deepseek-ai/De…

🐋 4/n

Weiting (Steven) Tan さんがリポスト

JHU Computer Science

@JHUCompSci

/01/09

Congratulations to Prof. Philipp Koehn on being named a Fellow of the @aclmeeting! cs.jhu.edu/news/philipp-k…

JHUCompSci's tweet card. The ACL Fellows program recognizes members whose contributions have been extraordinary in terms of scientific and technical excellence, service to the association and the community, and/or educatio...

Philipp Koehn elected as Fellow of the Association for Computational Linguistics

ソース: cs.jhu.edu

Weiting (Steven) Tan

@weiting_nlp

/12/17

I had a great time helping host MASC-SLL at Hopkins last year. MASC-SLL is a great opportunity to connect with fellow AI/NLP/Speech researchers. If your organization is in the Mid-Atlantic region and is interested in hosting the event, please reach out!

MASC-ALL Conference

@MASC_Conference

/12/16

📢 Want to host MASC 2025? The 12th Mid-Atlantic Student Colloquium is a one day event bringing together students, faculty and researchers from universities/industry in the Mid-Atlantic. Please submit this very short form if you are interested in hosting! Deadline January 6th

Weiting (Steven) Tan さんがリポスト

Tianjian Li

@tli104

/12/06

I have written a blogpost offering an explanation of why both the chosen and the rejected log-probability decreases during DPO, and more interestingly, why it is a desired phenomenon to some extent. Link: tianjianl.github.io/blog/2024/dpo/

Weiting (Steven) Tan さんがリポスト

Sherjil Ozair

@sherjilozair

/12/03

Very happy to hear that GANs are getting the test of time award at NeurIPS 2024. The NeurIPS test of time awards are given to papers which have stood the test of the time for a decade. I took some time to reminisce how GANs came about and how AI has evolve in the last decade.

Weiting (Steven) Tan

@weiting_nlp

2024/10/18

Excited to see that SpiritLM is fully open-sourced now. It supports speech and text as both input and output. Please consider trying it at: github.com/facebookresear…

weiting_nlp's tweet card. Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model". - facebookresearch/spiritlm

GitHub - facebookresearch/spiritlm: Inference code for the paper "Spirit-LM Interleaved Spoken and...

ソース: github.com

AI at Meta

@AIatMeta

2024/10/18

Open science is how we continue to push technology forward and today at Meta FAIR we’re sharing eight new AI research artifacts including new models, datasets and code to inspire innovation in the community. More in the video from @jpineau1. This work is another important step…