weiting_nlp's profile picture. Ph.D. Candidate at @jhuclsp, Student Researcher @Bytedance Seed | Prev @AIatMeta @Amazon Alexa AI

Weiting (Steven) Tan

@weiting_nlp

Ph.D. Candidate at @jhuclsp, Student Researcher @Bytedance Seed | Prev @AIatMeta @Amazon Alexa AI

固定されたツイート

I was curious about the voice agent, particularly its ability for tool-use and reasoning, while listening to a user. So we train voice agents by building a sandbox (based on tau-bench) that uses GPT-4.1 for user simulation and SEED-TTS for speech synthesis. #Agents #ToolUse

weiting_nlp's tweet image. I was curious about the voice agent, particularly its ability for tool-use and reasoning, while listening to a user.

So we train voice agents by building a sandbox (based on tau-bench) that uses GPT-4.1 for user simulation and SEED-TTS for speech synthesis.
#Agents #ToolUse

Weiting (Steven) Tan さんがリポスト

I contributed to SeamlessM4T and Seamless Interaction at Meta. I’ve worked on ASR, speech translation, TTS, full-duplex speech LLM, audiovisual, and human motion. Especially, I have expertise in realtime streaming modeling. scholar.google.com/citations?user…


Weiting (Steven) Tan さんがリポスト

Exploration is fundamental to RL. Yet policy gradient methods often collapse: during training they fail to explore broadly, and converge into narrow, easily exploitable behaviors. The result is poor generalization, limited gains from test-time scaling, and brittleness on tasks…

jubayer_hamid's tweet image. Exploration is fundamental to RL. Yet policy gradient methods often collapse: during training they fail to explore broadly, and converge into narrow, easily exploitable behaviors. The result is poor generalization, limited gains from test-time scaling, and brittleness on tasks…

Weiting (Steven) Tan さんがリポスト

Most RL for LLMs today is single-step optimization on a given state (e.g., an instruction), which is essentially a bandit setup. But to learn a meta-policy that can solve various bandit problems via in-context trial and error, you need true multi-turn RL over a long horizon. So,…

sanxing_chen's tweet image. Most RL for LLMs today is single-step optimization on a given state (e.g., an instruction), which is essentially a bandit setup. But to learn a meta-policy that can solve various bandit problems via in-context trial and error, you need true multi-turn RL over a long horizon. So,…

Weiting (Steven) Tan さんがリポスト

Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the “systems view” of LLMs and wrote a little textbook called “How To Scale Your Model” which we’re releasing today. 1/n

jacobaustin132's tweet image. Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the “systems view” of LLMs and wrote a little textbook called “How To Scale Your Model” which we’re releasing today. 1/n

Weiting (Steven) Tan さんがリポスト

🚀 Excited to finally share our paper on VerlTool, released today after months of work since the initial release in late May! VerlTool is a high-efficiency, easy-to-use framework for Agentic RL with Tool use (ARLT), built on top of VeRL. It currently supports a wide range of…

DongfuJiang's tweet image. 🚀 Excited to finally share our paper on VerlTool, released today after months of work since the initial release in late May!

VerlTool is a high-efficiency, easy-to-use framework for Agentic RL with Tool use (ARLT), built on top of VeRL. It currently supports a wide range of…

Introducing VerlTool - a unified and easy-to-extend tool agent training framework based on verl. Recently, there's been a growing trend toward training tool agents with reinforcement learning algorithms like GRPO and PPO. Representative works include SearchR1, ToRL, ReTool, and…

DongfuJiang's tweet image. Introducing VerlTool - a unified and easy-to-extend tool agent training framework based on verl.

Recently, there's been a growing trend toward training tool agents with reinforcement learning algorithms like GRPO and PPO. Representative works include SearchR1, ToRL, ReTool, and…


Weiting (Steven) Tan さんがリポスト

🌀Diversity Aware RL (DARLING)🌀 📝: arxiv.org/abs/2509.02534 - Jointly optimizes for quality & diversity using a learned partition function - Outperforms standard RL in quality AND diversity metrics, e.g. higher pass@1/p@k - Works for both non-verifiable & verifiable tasks 🧵1/5

jaseweston's tweet image. 🌀Diversity Aware RL (DARLING)🌀
📝: arxiv.org/abs/2509.02534
- Jointly optimizes for quality & diversity using a learned partition function
- Outperforms standard RL in quality AND diversity metrics, e.g. higher pass@1/p@k
- Works for both non-verifiable & verifiable tasks
🧵1/5

Weiting (Steven) Tan さんがリポスト

Our latest on compressed representations: Key-Value Distillation (KVD). Query-independen transformer compression, with offline supervised distillation.

ben_vandurme's tweet image. Our latest on compressed representations: Key-Value Distillation (KVD). Query-independen transformer compression, with offline supervised distillation.

Weiting (Steven) Tan さんがリポスト

🛠️ DeepSeek-R1: Technical Highlights 📈 Large-scale RL in post-training 🏆 Significant performance boost with minimal labeled data 🔢 Math, code, and reasoning tasks on par with OpenAI-o1 📄 More details: github.com/deepseek-ai/De… 🐋 4/n

deepseek_ai's tweet image. 🛠️ DeepSeek-R1: Technical Highlights

📈 Large-scale RL in post-training
🏆 Significant performance boost with minimal labeled data
🔢 Math, code, and reasoning tasks on par with OpenAI-o1
📄 More details: github.com/deepseek-ai/De…

🐋 4/n

Weiting (Steven) Tan さんがリポスト

Congratulations to Prof. Philipp Koehn on being named a Fellow of the @aclmeeting! cs.jhu.edu/news/philipp-k…


I had a great time helping host MASC-SLL at Hopkins last year. MASC-SLL is a great opportunity to connect with fellow AI/NLP/Speech researchers. If your organization is in the Mid-Atlantic region and is interested in hosting the event, please reach out!

📢 Want to host MASC 2025? The 12th Mid-Atlantic Student Colloquium is a one day event bringing together students, faculty and researchers from universities/industry in the Mid-Atlantic. Please submit this very short form if you are interested in hosting! Deadline January 6th



Weiting (Steven) Tan さんがリポスト

I have written a blogpost offering an explanation of why both the chosen and the rejected log-probability decreases during DPO, and more interestingly, why it is a desired phenomenon to some extent. Link: tianjianl.github.io/blog/2024/dpo/


Weiting (Steven) Tan さんがリポスト

Very happy to hear that GANs are getting the test of time award at NeurIPS 2024. The NeurIPS test of time awards are given to papers which have stood the test of the time for a decade. I took some time to reminisce how GANs came about and how AI has evolve in the last decade.


Excited to see that SpiritLM is fully open-sourced now. It supports speech and text as both input and output. Please consider trying it at: github.com/facebookresear…

Open science is how we continue to push technology forward and today at Meta FAIR we’re sharing eight new AI research artifacts including new models, datasets and code to inspire innovation in the community. More in the video from @jpineau1. This work is another important step…



Loading...

Something went wrong.


Something went wrong.