timhuangxt's profile picture. Senior Researcher @TencentGlobal, working on LLMs. 
Ph.D. at @UniMelb; Ex @BytedanceTalk, @MSFTResearch

Xinting Huang

@timhuangxt

Senior Researcher @TencentGlobal, working on LLMs. Ph.D. at @UniMelb; Ex @BytedanceTalk, @MSFTResearch

Pinned

Also glad to share: We made it work for instruction following models! NH2-Mixtral-8x7B ➕ NH2-Solar-10.7B ➕and OpenChat-3.5-7B ➡️ new sota for 7B model in MT-BENCH Check it out: huggingface.co/papers/2402.16…

huggingface.co

Paper page - FuseChat: Knowledge Fusion of Chat Models

Paper page - FuseChat: Knowledge Fusion of Chat Models

Knowledge Fusion of LLMs Is it possible to merge existing models into a more potent model? We have already seen a few ways that show the potential to effectively do this using approaches like weight merging and ensembling of models. This work proposes FuseLLM with the core…

omarsar0's tweet image. Knowledge Fusion of LLMs

Is it possible to merge existing models into a more potent model? 

We have already seen a few ways that show the potential to effectively do this using approaches like weight merging and ensembling of models. 

This work proposes FuseLLM with the core…


Xinting Huang reposted

The End of Manual Decoding: Meet AutoDeco Researchers unveil AutoDeco, a groundbreaking framework that teaches LLMs to control their own decoding strategy. It dynamically predicts temperature & top-p for each token, eliminating manual tuning & enabling natural language control.

HuggingPapers's tweet image. The End of Manual Decoding: Meet AutoDeco

Researchers unveil AutoDeco, a groundbreaking framework that teaches LLMs to control their own decoding strategy. It dynamically predicts temperature & top-p for each token, eliminating manual tuning & enabling natural language control.

Xinting Huang reposted

Introducing Search Self-play (SSP, arxiv.org/abs/2510.18821)! We let deep search agents act simultaneously as a task proposer and a problem solver. Through competition and cooperation, their agent capabilities co-evolve and uniformly surpass SOTA performance without supervision!

cheng_pengyu's tweet image. Introducing Search Self-play (SSP, arxiv.org/abs/2510.18821)! We let deep search agents act simultaneously as a task proposer and a problem solver. Through competition and cooperation, their agent capabilities co-evolve and uniformly surpass SOTA performance without supervision!
cheng_pengyu's tweet image. Introducing Search Self-play (SSP, arxiv.org/abs/2510.18821)! We let deep search agents act simultaneously as a task proposer and a problem solver. Through competition and cooperation, their agent capabilities co-evolve and uniformly surpass SOTA performance without supervision!
cheng_pengyu's tweet image. Introducing Search Self-play (SSP, arxiv.org/abs/2510.18821)! We let deep search agents act simultaneously as a task proposer and a problem solver. Through competition and cooperation, their agent capabilities co-evolve and uniformly surpass SOTA performance without supervision!

Xinting Huang reposted

🚀 Excited to announce our 🍁Marco‑MT🍁 achieved outstanding results at #WMT2025 General Translation! 🏆 Notably, in English→Chinese it outperformed closed‑source leaders like GPT‑4.1 and Gemini 2.5 Pro. Among 13 language pairs we competed in, Maroc-MT-Algharb achieves (final…

wangly0229's tweet image. 🚀 Excited to announce our 🍁Marco‑MT🍁 achieved outstanding results at #WMT2025 General Translation! 🏆 Notably, in English→Chinese it outperformed closed‑source leaders like GPT‑4.1 and Gemini 2.5 Pro.

Among 13 language pairs we competed in, Maroc-MT-Algharb achieves (final…

Xinting Huang reposted

🌺GPT-4o’s image generation is stunning — but how well does it handle complex scenarios? 🤔 We introduce 🚀CIGEVAL🚀, a novel method to evaluate models' capabilities in Conditional Image Generation 🖼️➕🖼️🟰🖼️. Find out how top models perform when conditions get truly…

wangly0229's tweet image. 🌺GPT-4o’s image generation is stunning — but how well does it handle complex scenarios? 🤔

We introduce 🚀CIGEVAL🚀, a novel method to evaluate models' capabilities in Conditional Image Generation 🖼️➕🖼️🟰🖼️. Find out how top models perform when conditions get truly…

These findings resonate with my impressions. AFAIC, structured prompting outperforms CoT & ICL by steering LLMs through workflows. Great to see this ‘rebuttal’ backed by such rigorous analysis — reminds me of the insights in LLMs Cannot Self-Correct. We need more like this!

Does Structured Outputs hurt LLM performance? 🤔 The paper "Let Me Speak Freely" paper claimed that it does, but new experiments by @dottxtai (team behind outlines) show it doesn’t if you do it correctly! 👀 TL;DR; 📈 The "Let Me Speak Freely" poor results came from weak…

_philschmid's tweet image. Does Structured Outputs hurt LLM performance? 🤔 The paper "Let Me Speak Freely" paper claimed that it does, but new experiments by @dottxtai  (team behind outlines) show it doesn’t if you do it correctly! 👀

TL;DR;
📈 The "Let Me Speak Freely" poor results came from weak…


Exciting to see our old friend continuing to push the real-world boundaries of LLM applications (shoutout to MT here)!

🔥Our LLM-powered MT (Marco-MT) has achieved massive commercial use, leading the industry in both efficiency and cost-effectiveness. 🌏 Revolutionizing translation in e-commerce and beyond! 🚀 🌍 For more details: bloomberg.com/news/videos/20… ✨ Try it now: aidc-ai.com

wangly0229's tweet image. 🔥Our LLM-powered MT (Marco-MT) has achieved massive commercial use, leading the industry in both efficiency and cost-effectiveness. 🌏 Revolutionizing translation in e-commerce and beyond! 🚀

🌍 For more details: bloomberg.com/news/videos/20…
✨ Try it now: aidc-ai.com
wangly0229's tweet image. 🔥Our LLM-powered MT (Marco-MT) has achieved massive commercial use, leading the industry in both efficiency and cost-effectiveness. 🌏 Revolutionizing translation in e-commerce and beyond! 🚀

🌍 For more details: bloomberg.com/news/videos/20…
✨ Try it now: aidc-ai.com
wangly0229's tweet image. 🔥Our LLM-powered MT (Marco-MT) has achieved massive commercial use, leading the industry in both efficiency and cost-effectiveness. 🌏 Revolutionizing translation in e-commerce and beyond! 🚀

🌍 For more details: bloomberg.com/news/videos/20…
✨ Try it now: aidc-ai.com
wangly0229's tweet image. 🔥Our LLM-powered MT (Marco-MT) has achieved massive commercial use, leading the industry in both efficiency and cost-effectiveness. 🌏 Revolutionizing translation in e-commerce and beyond! 🚀

🌍 For more details: bloomberg.com/news/videos/20…
✨ Try it now: aidc-ai.com


Xinting Huang reposted

To Code, or Not To Code? Exploring Impact of Code in Pre-training discuss: huggingface.co/papers/2408.10… Including code in the pre-training data mixture, even for models not specifically designed for code, has become a common practice in LLMs pre-training. While there has been…

_akhaliq's tweet image. To Code, or Not To Code? 

Exploring Impact of Code in Pre-training

discuss: huggingface.co/papers/2408.10…

Including code in the pre-training data mixture, even for models not specifically designed for code, has become a common practice in LLMs pre-training. While there has been…

Xinting Huang reposted

🚀Check out VideoVista, a comprehensive video-LMMs evaluation benchmark! videovista.github.io 🚀 Dive into our leaderboard: - 📊 Evaluating 33 Video-LMMs across 27 tasks; - 🥉 The latest GPT-4o-Mini clinches 3rd place; - 🏆 InternLM-XComposer-2.5 emerges as the…

🚀Check out VideoVista, our comprehensive video-LMMs evaluation benchmark! We've assessed 33 video Video-LMMs across 27 tasks. Highlights include the latest GPT-4o-Mini, ranked third, and InternLM-XComposer-2.5, the top-performing open-source model. More: videovista.github.io

LyxTg's tweet image. 🚀Check out VideoVista, our comprehensive video-LMMs evaluation benchmark! We've assessed 33 video Video-LMMs across 27 tasks. Highlights include the latest GPT-4o-Mini, ranked third, and InternLM-XComposer-2.5, the top-performing open-source model. 
More: videovista.github.io


Open-sourced Multimodal models -- fascinating Open-sourced MOE models -- fascinating Open-sourced Multimodal MOE models -- WOW! check this out 👇

🥳We introduce Uni-MoE, a unified multimodal LLM based on sparse MoE architecture. It integrates 📹 video, 🖼️ image, 📄 text, 🔊 audio, and 🗣️ speech, supporting 8+ experts in parallel training across mixed modalities. 🌈Paper: arxiv.org/abs/2405.11273. 💐Project (Code, Data,…

wangly0229's tweet image. 🥳We introduce Uni-MoE, a unified multimodal LLM based on sparse MoE architecture. It integrates 📹 video, 🖼️ image, 📄 text, 🔊 audio, and 🗣️ speech, supporting 8+ experts in parallel training across mixed modalities. 
🌈Paper: arxiv.org/abs/2405.11273. 
💐Project (Code, Data,…
wangly0229's tweet image. 🥳We introduce Uni-MoE, a unified multimodal LLM based on sparse MoE architecture. It integrates 📹 video, 🖼️ image, 📄 text, 🔊 audio, and 🗣️ speech, supporting 8+ experts in parallel training across mixed modalities. 
🌈Paper: arxiv.org/abs/2405.11273. 
💐Project (Code, Data,…
wangly0229's tweet image. 🥳We introduce Uni-MoE, a unified multimodal LLM based on sparse MoE architecture. It integrates 📹 video, 🖼️ image, 📄 text, 🔊 audio, and 🗣️ speech, supporting 8+ experts in parallel training across mixed modalities. 
🌈Paper: arxiv.org/abs/2405.11273. 
💐Project (Code, Data,…
wangly0229's tweet image. 🥳We introduce Uni-MoE, a unified multimodal LLM based on sparse MoE architecture. It integrates 📹 video, 🖼️ image, 📄 text, 🔊 audio, and 🗣️ speech, supporting 8+ experts in parallel training across mixed modalities. 
🌈Paper: arxiv.org/abs/2405.11273. 
💐Project (Code, Data,…


Xinting Huang reposted

🚀 A game-changer benchmark: LLM-Uncertainty-Bench 🌟 📚 We introduce "Benchmarking LLMs via Uncertainty Quantification", which challenges the status quo in LLM evaluation. 💡 Uncertainty matters too: we propose a novel uncertainty-aware metric, which tests 8 LLMs across 5…

wangly0229's tweet image. 🚀 A game-changer benchmark: LLM-Uncertainty-Bench 🌟

📚 We introduce "Benchmarking LLMs via Uncertainty Quantification", which challenges the status quo in LLM evaluation.
💡 Uncertainty matters too: we propose a novel uncertainty-aware metric, which tests 8 LLMs across 5…
wangly0229's tweet image. 🚀 A game-changer benchmark: LLM-Uncertainty-Bench 🌟

📚 We introduce "Benchmarking LLMs via Uncertainty Quantification", which challenges the status quo in LLM evaluation.
💡 Uncertainty matters too: we propose a novel uncertainty-aware metric, which tests 8 LLMs across 5…
wangly0229's tweet image. 🚀 A game-changer benchmark: LLM-Uncertainty-Bench 🌟

📚 We introduce "Benchmarking LLMs via Uncertainty Quantification", which challenges the status quo in LLM evaluation.
💡 Uncertainty matters too: we propose a novel uncertainty-aware metric, which tests 8 LLMs across 5…

Xinting Huang reposted

FuseChat Knowledge Fusion of Chat Models While training large language models (LLMs) from scratch can indeed lead to models with distinct capabilities and strengths, this approach incurs substantial costs and may lead to potential redundancy in competencies. An alternative…

_akhaliq's tweet image. FuseChat

Knowledge Fusion of Chat Models

While training large language models (LLMs) from scratch can indeed lead to models with distinct capabilities and strengths, this approach incurs substantial costs and may lead to potential redundancy in competencies. An alternative…

United States Trends

Loading...

Something went wrong.


Something went wrong.