Xinting Huang

@timhuangxt

Senior Researcher @TencentGlobal, working on LLMs. Ph.D. at @UniMelb; Ex @BytedanceTalk, @MSFTResearch

Melbourne, Australia

timhuang1.github.io

Joined June 2016

15Posts 151Followers 354Following

Pinned

Xinting Huang

@timhuangxt

Apr 6, 2024

Also glad to share: We made it work for instruction following models! NH2-Mixtral-8x7B ➕ NH2-Solar-10.7B ➕and OpenChat-3.5-7B ➡️ new sota for 7B model in MT-BENCH Check it out: huggingface.co/papers/2402.16…

huggingface.co

Paper page - FuseChat: Knowledge Fusion of Chat Models

Source: huggingface.co

elvis

@omarsar0

Jan 22, 2024

Knowledge Fusion of LLMs Is it possible to merge existing models into a more potent model? We have already seen a few ways that show the potential to effectively do this using approaches like weight merging and ensembling of models. This work proposes FuseLLM with the core…

omarsar0's tweet image. Knowledge Fusion of LLMs

Is it possible to merge existing models into a more potent model?

We have already seen a few ways that show the potential to effectively do this using approaches like weight merging and ensembling of models.

This work proposes FuseLLM with the core…

Xinting Huang reposted

DailyPapers

@HuggingPapers

Oct 31

The End of Manual Decoding: Meet AutoDeco Researchers unveil AutoDeco, a groundbreaking framework that teaches LLMs to control their own decoding strategy. It dynamically predicts temperature & top-p for each token, eliminating manual tuning & enabling natural language control.

HuggingPapers's tweet image. The End of Manual Decoding: Meet AutoDeco

Researchers unveil AutoDeco, a groundbreaking framework that teaches LLMs to control their own decoding strategy. It dynamically predicts temperature &amp; top-p for each token, eliminating manual tuning &amp; enabling natural language control.

Xinting Huang reposted

Pengyu Cheng

@cheng_pengyu

Oct 22

Introducing Search Self-play (SSP, arxiv.org/abs/2510.18821)! We let deep search agents act simultaneously as a task proposer and a problem solver. Through competition and cooperation, their agent capabilities co-evolve and uniformly surpass SOTA performance without supervision!

cheng_pengyu's tweet image. Introducing Search Self-play (SSP, arxiv.org/abs/2510.18821)! We let deep search agents act simultaneously as a task proposer and a problem solver. Through competition and cooperation, their agent capabilities co-evolve and uniformly surpass SOTA performance without supervision!

Xinting Huang reposted

Longyue Wang

@wangly0229

Oct 23

🚀 Excited to announce our 🍁Marco‑MT🍁 achieved outstanding results at #WMT2025 General Translation! 🏆 Notably, in English→Chinese it outperformed closed‑source leaders like GPT‑4.1 and Gemini 2.5 Pro. Among 13 language pairs we competed in, Maroc-MT-Algharb achieves (final…

wangly0229's tweet image. 🚀 Excited to announce our 🍁Marco‑MT🍁 achieved outstanding results at #WMT2025 General Translation! 🏆 Notably, in English→Chinese it outperformed closed‑source leaders like GPT‑4.1 and Gemini 2.5 Pro.

Among 13 language pairs we competed in, Maroc-MT-Algharb achieves (final…

Xinting Huang reposted

Longyue Wang

@wangly0229

Apr 10

🌺GPT-4o’s image generation is stunning — but how well does it handle complex scenarios? 🤔 We introduce 🚀CIGEVAL🚀, a novel method to evaluate models' capabilities in Conditional Image Generation 🖼️➕🖼️🟰🖼️. Find out how top models perform when conditions get truly…

wangly0229's tweet image. 🌺GPT-4o’s image generation is stunning — but how well does it handle complex scenarios? 🤔

We introduce 🚀CIGEVAL🚀, a novel method to evaluate models' capabilities in Conditional Image Generation 🖼️➕🖼️🟰🖼️. Find out how top models perform when conditions get truly…

Xinting Huang

@timhuangxt

Nov 26

These findings resonate with my impressions. AFAIC, structured prompting outperforms CoT & ICL by steering LLMs through workflows. Great to see this ‘rebuttal’ backed by such rigorous analysis — reminds me of the insights in LLMs Cannot Self-Correct. We need more like this!

Philipp Schmid

@_philschmid

Nov 25

Does Structured Outputs hurt LLM performance? 🤔 The paper "Let Me Speak Freely" paper claimed that it does, but new experiments by @dottxtai (team behind outlines) show it doesn’t if you do it correctly! 👀 TL;DR; 📈 The "Let Me Speak Freely" poor results came from weak…

_philschmid's tweet image. Does Structured Outputs hurt LLM performance? 🤔 The paper "Let Me Speak Freely" paper claimed that it does, but new experiments by @dottxtai (team behind outlines) show it doesn’t if you do it correctly! 👀

TL;DR;
📈 The "Let Me Speak Freely" poor results came from weak…

Xinting Huang

@timhuangxt

Oct 17, 2024

Exciting to see our old friend continuing to push the real-world boundaries of LLM applications (shoutout to MT here)!

Longyue Wang

@wangly0229

Oct 16, 2024

🔥Our LLM-powered MT (Marco-MT) has achieved massive commercial use, leading the industry in both efficiency and cost-effectiveness. 🌏 Revolutionizing translation in e-commerce and beyond! 🚀 🌍 For more details: bloomberg.com/news/videos/20… ✨ Try it now: aidc-ai.com

wangly0229's tweet image. 🔥Our LLM-powered MT (Marco-MT) has achieved massive commercial use, leading the industry in both efficiency and cost-effectiveness. 🌏 Revolutionizing translation in e-commerce and beyond! 🚀

🌍 For more details: bloomberg.com/news/videos/20…
✨ Try it now: aidc-ai.com

Xinting Huang reposted

AK

@_akhaliq

Aug 21, 2024

To Code, or Not To Code? Exploring Impact of Code in Pre-training discuss: huggingface.co/papers/2408.10… Including code in the pre-training data mixture, even for models not specifically designed for code, has become a common practice in LLMs pre-training. While there has been…

_akhaliq's tweet image. To Code, or Not To Code?

Exploring Impact of Code in Pre-training

discuss: huggingface.co/papers/2408.10…

Including code in the pre-training data mixture, even for models not specifically designed for code, has become a common practice in LLMs pre-training. While there has been…

Xinting Huang reposted

Longyue Wang

@wangly0229

Jul 20, 2024

🚀Check out VideoVista, a comprehensive video-LMMs evaluation benchmark! videovista.github.io 🚀 Dive into our leaderboard: - 📊 Evaluating 33 Video-LMMs across 27 tasks; - 🥉 The latest GPT-4o-Mini clinches 3rd place; - 🏆 InternLM-XComposer-2.5 emerges as the…

Yunxin Li

@LyxTg

Jul 20, 2024

🚀Check out VideoVista, our comprehensive video-LMMs evaluation benchmark! We've assessed 33 video Video-LMMs across 27 tasks. Highlights include the latest GPT-4o-Mini, ranked third, and InternLM-XComposer-2.5, the top-performing open-source model. More: videovista.github.io

LyxTg's tweet image. 🚀Check out VideoVista, our comprehensive video-LMMs evaluation benchmark! We've assessed 33 video Video-LMMs across 27 tasks. Highlights include the latest GPT-4o-Mini, ranked third, and InternLM-XComposer-2.5, the top-performing open-source model.
More: videovista.github.io

Xinting Huang

@timhuangxt

May 21, 2024

Open-sourced Multimodal models -- fascinating Open-sourced MOE models -- fascinating Open-sourced Multimodal MOE models -- WOW! check this out 👇

Longyue Wang

@wangly0229

May 21, 2024

🥳We introduce Uni-MoE, a unified multimodal LLM based on sparse MoE architecture. It integrates 📹 video, 🖼️ image, 📄 text, 🔊 audio, and 🗣️ speech, supporting 8+ experts in parallel training across mixed modalities. 🌈Paper: arxiv.org/abs/2405.11273. 💐Project (Code, Data,…

wangly0229's tweet image. 🥳We introduce Uni-MoE, a unified multimodal LLM based on sparse MoE architecture. It integrates 📹 video, 🖼️ image, 📄 text, 🔊 audio, and 🗣️ speech, supporting 8+ experts in parallel training across mixed modalities.
🌈Paper: arxiv.org/abs/2405.11273.
💐Project (Code, Data,…

Xinting Huang reposted

Longyue Wang

@wangly0229

Jan 24, 2024

🚀 A game-changer benchmark: LLM-Uncertainty-Bench 🌟 📚 We introduce "Benchmarking LLMs via Uncertainty Quantification", which challenges the status quo in LLM evaluation. 💡 Uncertainty matters too: we propose a novel uncertainty-aware metric, which tests 8 LLMs across 5…

wangly0229's tweet image. 🚀 A game-changer benchmark: LLM-Uncertainty-Bench 🌟

📚 We introduce "Benchmarking LLMs via Uncertainty Quantification", which challenges the status quo in LLM evaluation.
💡 Uncertainty matters too: we propose a novel uncertainty-aware metric, which tests 8 LLMs across 5…

Xinting Huang reposted

AK

@_akhaliq

Feb 27, 2024

FuseChat Knowledge Fusion of Chat Models While training large language models (LLMs) from scratch can indeed lead to models with distinct capabilities and strengths, this approach incurs substantial costs and may lead to potential redundancy in competencies. An alternative…