DocXavi's profile picture. Applied scientist at @amazonscience Barcelona, Catalonia. Made at @la_upc & @columbia. Promoting @dlbcnai. Opinions my own.

Xavi Giró

@DocXavi

Applied scientist at @amazonscience Barcelona, Catalonia. Made at @la_upc & @columbia. Promoting @dlbcnai. Opinions my own.

Xavi Giró reposted

Congratulations @TencentHunyuan for the best image generation model in the world. It comes with a fantastic paper that describes the data pipeline in great detail. More detail on the RL part would be great 😉

🏆HunyuanImage 3.0 has taken the #1 spot in @arena, ranked as both the top overall and top open-source Text-to-Image model. This achievement came just one week after release and followed a week at the top of Hugging Face trend list. Big thanks to the community for the incredible…

TencentHunyuan's tweet image. 🏆HunyuanImage 3.0 has taken the #1 spot in @arena, ranked as both the top overall and top open-source Text-to-Image model. This achievement came just one week after release and followed a week at the top of Hugging Face trend list. Big thanks to the community for the incredible…
TencentHunyuan's tweet image. 🏆HunyuanImage 3.0 has taken the #1 spot in @arena, ranked as both the top overall and top open-source Text-to-Image model. This achievement came just one week after release and followed a week at the top of Hugging Face trend list. Big thanks to the community for the incredible…


Xavi Giró reposted

Leaving Meta and PyTorch I'm stepping down from PyTorch and leaving Meta on November 17th. tl;dr: Didn't want to be doing PyTorch forever, seemed like the perfect time to transition right after I got back from a long leave and the project built itself around me. Eleven years…

soumithchintala's tweet image. Leaving Meta and PyTorch

I'm stepping down from PyTorch and leaving Meta on November 17th.

tl;dr: Didn't want to be doing PyTorch forever, seemed like the perfect time to transition right after I got back from a long leave and the project built itself around me.

Eleven years…

Xavi Giró reposted

We have unlocked parallel training of non-linear RNNs! > LSTM entered the chat 🔥

𝗣𝗮𝗿𝗮𝗥𝗡𝗡: 𝗨𝗻𝗹𝗼𝗰𝗸𝗶𝗻𝗴 𝗣𝗮𝗿𝗮𝗹𝗹𝗲𝗹 𝗧𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗼𝗳 𝗡𝗼𝗻𝗹𝗶𝗻𝗲𝗮𝗿 𝗥𝗡𝗡𝘀 𝗳𝗼𝗿 𝗟𝗟𝗠𝘀 For years, we’ve given RNNs for doomed, and looked at Transformer as 𝘁𝗵𝗲 LLM—but we just needed better math 📄arxiv.org/abs/2510.21450 💻github.com/apple/ml-parar…



Xavi Giró reposted

Check out the list of #WACV2026 workshops at wacv.thecvf.com/Conferences/20…. Many have paper submission deadlines in the next month!

wacv_official's tweet image. Check out the list of #WACV2026 workshops at wacv.thecvf.com/Conferences/20…. Many have paper submission deadlines in the next month!

Xavi Giró reposted

Good thread! I'd also add that faculty positions are often decently paid, give you the freedom to do work that benefits society, and, if everything works out, you can keep the job for life so you don't need to worry about saving 50% of your income and all of that FIRE nonsense.

Even before @mmitchell_ai recently raised this discussion, I've had conversation after conversation with students & new grads struggling with this exact dilemma. I want to help! Here's a live thread of AI-related opportunities for those looking to do good & make (enough) money:

rajiinio's tweet image. Even before @mmitchell_ai recently raised this discussion, I've had conversation after conversation with students & new grads struggling with this exact dilemma.

I want to help! Here's a live thread of AI-related opportunities for those looking to do good & make (enough) money:


Xavi Giró reposted

✨The source code for our paper “Look, Listen and Recognise: Character-Aware Audio-Visual Subtitling” is now released! ✨ Given a video, its audio, and a cast list for each episode, the model can automatically generate subtitles with speaker names.

huh_jaesung's tweet image. ✨The source code for our paper “Look, Listen and Recognise: Character-Aware Audio-Visual Subtitling” is now released! ✨
Given a video, its audio, and a cast list for each episode, the model can automatically generate subtitles with speaker names.

It was Spring 2015, and it was the first time I was teaching deep learning, @UPCTelecos Barcelona. Only two students posed questions that day. 10 years later, they are both scientists at @GoogleDeepMind. As said in @AmazonScience : “learn and be curious”.

DocXavi's tweet image. It was Spring 2015, and it was the first time I was teaching deep learning, @UPCTelecos Barcelona. Only two students posed questions that day. 

10 years later, they are both scientists at @GoogleDeepMind.

As said in @AmazonScience : “learn and be curious”.

Xavi Giró reposted

Tired of chasing references across dozens of papers? This monograph distills it all: the principles, intuition, and math behind diffusion models. Thrilled to share!

Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! 📘 We’re excited to release 《The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core…

JCJesseLai's tweet image. Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on!

📘 We’re excited to release 《The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon.

It traces the core…


Xavi Giró reposted

Congratulations to @Yoshua_Bengio, founder and scientific advisor of Mila, who has become the first researcher in the world to surpass one million citations on Google Scholar, the leading platform for academic and scientific research. A remarkable milestone that highlights the…

Our Founder and Scientific Director @Yoshua_Bengio has become the first living researcher to surpass 1 million citations on Google Scholar, a testament to the foundational and global impact of his work. Congratulations Yoshua!

LawZero_'s tweet image. Our Founder and Scientific Director @Yoshua_Bengio has become the first living researcher to surpass 1 million citations on Google Scholar, a testament to the foundational and global impact of his work. Congratulations Yoshua!


This is my annual post claiming that the program of #DLBCN 2025 this year is even better than the previous one.

The 9 spotlights talks of #DLBCN 2025 have been announced in our site: sites.google.com/view/dlbcn2025… The selection was made based on both quality & diversity criteria.

dlbcnai's tweet image. The 9 spotlights talks of #DLBCN 2025 have been announced in our site:

sites.google.com/view/dlbcn2025…

The selection was made based on both quality & diversity criteria.
dlbcnai's tweet image. The 9 spotlights talks of #DLBCN 2025 have been announced in our site:

sites.google.com/view/dlbcn2025…

The selection was made based on both quality & diversity criteria.
dlbcnai's tweet image. The 9 spotlights talks of #DLBCN 2025 have been announced in our site:

sites.google.com/view/dlbcn2025…

The selection was made based on both quality & diversity criteria.


Xavi Giró reposted

And yet another paper within our series of works on music matching. Now... 🥁🥁🥁 Music sampling! Complementing fingerprinting, version ID, and lyrics matching systems, detecting music sampling is key given modern music practice. SOTA results with a number of clever tricks.

Eminem sampled Aerosmith, 50 Cent sampled Nina Simone, everybody sampled Chic... Many great songs sampled existing ones! Detecting this is the topic of our latest paper with @serrjoa at @SonyAI Barcelona 😎 tl;dr: multi-track dataset + few tricks = +18% boost over SOTA 🚀 1/N

howariou's tweet image. Eminem sampled Aerosmith, 50 Cent sampled Nina Simone, everybody sampled Chic... Many great songs sampled existing ones!

Detecting this is the topic of our latest paper with @serrjoa at @SonyAI Barcelona 😎

tl;dr: multi-track dataset + few tricks = +18% boost over SOTA 🚀

1/N


Xavi Giró reposted

UPC will award an honoris causa doctorate to Oriol Vinyals (@OriolVinyalsML) from @GoogleDeepMind. telecos.upc.edu/ca/noticies/la… Oriol was a keynote speaker in #DLBCN 2019:

dlbcnai's tweet image. UPC will award an honoris causa doctorate to Oriol Vinyals (@OriolVinyalsML) from @GoogleDeepMind. 

telecos.upc.edu/ca/noticies/la…

Oriol was a keynote speaker in #DLBCN 2019:

Xavi Giró reposted

When receiving the Everingham Prize yesterday, I gave a short presentation on the progress of vision-language research over the last decade. Slides (with transcript of my speech in the notes section): docs.google.com/presentation/d… Video of the talk: photos.app.goo.gl/bMrE8hHSiiN98z… Students…

As part of the award ceremony, @aagrawalAA presented a recap of vision-and-language research over the last decade — solved problems, progress, and open-challenges for mutimodal LLMs. Solved: robustness to paraphrasing and false premises, OCR, world-knowledge based reasoning.…

DhruvBatra_'s tweet image. As part of the award ceremony, @aagrawalAA presented a recap of vision-and-language research over the last decade — solved problems, progress, and open-challenges for mutimodal LLMs.

Solved: robustness to paraphrasing and false premises, OCR, world-knowledge based reasoning.…
DhruvBatra_'s tweet image. As part of the award ceremony, @aagrawalAA presented a recap of vision-and-language research over the last decade — solved problems, progress, and open-challenges for mutimodal LLMs.

Solved: robustness to paraphrasing and false premises, OCR, world-knowledge based reasoning.…
DhruvBatra_'s tweet image. As part of the award ceremony, @aagrawalAA presented a recap of vision-and-language research over the last decade — solved problems, progress, and open-challenges for mutimodal LLMs.

Solved: robustness to paraphrasing and false premises, OCR, world-knowledge based reasoning.…
DhruvBatra_'s tweet image. As part of the award ceremony, @aagrawalAA presented a recap of vision-and-language research over the last decade — solved problems, progress, and open-challenges for mutimodal LLMs.

Solved: robustness to paraphrasing and false premises, OCR, world-knowledge based reasoning.…


Xavi Giró reposted

👋Say Hi to MiMo-Audio! Our BREAKTHROUGH in general-purpose audio intelligence. 🎯 Scaling pretraining to 100M+ hours leads to EMERGENCE of few-shot generalization across diverse audio tasks! 🔥 Post-trained MiMo-Audio-7B-Instruct: • crushes benchmarks: SOTA on MMSU, MMAU,…

_TobiasLee's tweet image. 👋Say Hi to MiMo-Audio! Our BREAKTHROUGH in general-purpose audio intelligence.

🎯 Scaling pretraining to 100M+ hours leads to EMERGENCE of few-shot generalization across diverse audio tasks!

🔥 Post-trained MiMo-Audio-7B-Instruct:
• crushes benchmarks: SOTA on MMSU, MMAU,…

Xavi Giró reposted

The @karpathy interview 0:00:00 – AGI is still a decade away 0:30:33 – LLM cognitive deficits 0:40:53 – RL is terrible 0:50:26 – How do humans learn? 1:07:13 – AGI will blend into 2% GDP growth 1:18:24 – ASI 1:33:38 – Evolution of intelligence & culture 1:43:43 - Why self…


🚀 Cut Your Image Review Costs with Smart AutoQA! ✨ The magic formula: As long as your AutoQA precision beats your GenAI accuracy, you're saving money and time. arxiv.org/abs/2510.16179

DocXavi's tweet image. 🚀 Cut Your Image Review Costs with Smart AutoQA!

✨ The magic formula: As long as your AutoQA precision beats your GenAI accuracy, you're saving money and time.

arxiv.org/abs/2510.16179

Xavi Giró reposted

In my blog post on latents for generative modelling, I pointed out that representation learning and reconstruction are two separate tasks (§6.3), which autoencoders try to solve simultaneously. Separating them makes sense. It opens up a lot of possibilities, as this work shows!

three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right. today, we introduce Representation Autoencoders (RAE). >> Retire VAEs. Use RAEs. 👇(1/n)

sainingxie's tweet image. three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right.

today, we introduce Representation Autoencoders (RAE).

>> Retire VAEs. Use RAEs. 👇(1/n)


Xavi Giró reposted

We wrote a book about representation learning! It’s fully open source, available and readable online, and covers everything from theoretical foundations to practical algorithms. 👷‍♂️ We’re hard at work updating the content for v2.0, and would love your feedback and contributions

_sdbuchanan's tweet image. We wrote a book about representation learning!

It’s fully open source, available and readable online, and covers everything from theoretical foundations to practical algorithms.

👷‍♂️ We’re hard at work updating the content for v2.0, and would love your feedback and contributions

Xavi Giró reposted

José M. Álvarez from NVIDIA will be the keynote speaker of #DLBCN 2025. Dr Álvarez leads the Autonomous Vechicle Applied Research Group at Nvidia, CA, USA (@NVIDIAAI ). Watch his interview with @neurofregides in 2024: youtube.com/shorts/x6HBanJ…

dlbcnai's tweet image. José M. Álvarez from NVIDIA will be the keynote speaker of #DLBCN 2025. Dr Álvarez leads the Autonomous Vechicle Applied Research Group at Nvidia, CA, USA (@NVIDIAAI ). 

Watch his interview with @neurofregides in 2024:
youtube.com/shorts/x6HBanJ…

Xavi Giró reposted

If you think @Apple is not doing much in AI, you're getting blindsided by the chatbot hype and not paying enough attention! They just released FastVLM and MobileCLIP2 on @huggingface. The models are up to 85x faster and 3.4x smaller than previous work, enabling real-time vision…


Loading...

Something went wrong.


Something went wrong.