Jaskirat Singh @ ICCV2025🌴
@1jaskiratsingh
Ph.D. Candidate at Australian National University | Intern @AIatMeta GenAI | @AdobeResearch | Multimodal Fusion Models and Agents | R2E-Gym | REPA-E
Can we optimize both the VAE tokenizer and diffusion model together in an end-to-end manner? Short Answer: Yes. 🚨 Introducing REPA-E: the first end-to-end tuning approach for jointly optimizing both the VAE and the latent diffusion model using REPA loss 🚨 Key Idea: 🧠…
[Videos are entanglements of space and time.] Around one year ago, we released VSI-Bench, in which we studied visual spatial intelligence: a fundamental but missing pillar of current MLLMs. Today, we are excited to introduce Cambrian-S, our further step that goes beyond visual…
Introducing Cambrian-S it’s a position, a dataset, a benchmark, and a model but above all, it represents our first steps toward exploring spatial supersensing in video. 🧶
Can LLMs accurately aggregate information over long, information-dense texts? Not yet… We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!
Introducing Cambrian-S it’s a position, a dataset, a benchmark, and a model but above all, it represents our first steps toward exploring spatial supersensing in video. 🧶
It’s an honor to have received the @QEPrize along with my fellow laureates! But it’s also a responsibility. AI’s impact to humanity is in the hands of all of us.
Today, The King presented The Queen Elizabeth Prize for Engineering at St James's Palace, celebrating the innovations which are transforming our world. 🧠 This year’s prize honours seven pioneers whose work has shaped modern artificial intelligence. 🔗 Find out more:…
you can’t build superintelligence without first building supersensing
New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals
Check out our work ThinkMorph, which thinks in multi-modalities, not just with them.
🚨Sensational title alert: we may have cracked the code to true multimodal reasoning. Meet ThinkMorph — thinking in modalities, not just with them. And what we found was... unexpected. 👀 Emergent intelligence, strong gains, and …🫣 🧵 arxiv.org/abs/2510.27492 (1/16)
Tests certify functional behavior; they don’t judge intent. GSO, our code optimization benchmark, now combines tests with a rubric-driven HackDetector to identify models that game the benchmark. We found that up to 30% of a model’s attempts are non-idiomatic reward hacks, which…
We added LLM judge based hack detector to our code optimization evals and found models perform non-idiomatic code changes in upto 30% of the problems 🤯
Tests certify functional behavior; they don’t judge intent. GSO, our code optimization benchmark, now combines tests with a rubric-driven HackDetector to identify models that game the benchmark. We found that up to 30% of a model’s attempts are non-idiomatic reward hacks, which…
end-to-end training just makes latent diffusion transformers better! with repa-e, we showed the power of end-to-end training on imagenet. today we are extending it to text-to-image (T2I) generation. #ICCV2025 🌴 🚨 Introducing "REPA-E for T2I: family of end-to-end tuned VAEs for…
With simple changes, I was able to cut down @krea_ai's new real-time video gen's timing from 25.54s to 18.14s 🔥🚀 1. FA3 through `kernels` 2. Regional compilation 3. Selective (FP8) quantization Notes are in 🧵 below
Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! 📘 We’re excited to release 《The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core…
Back in 2024, LMMs-Eval built a complete evaluation ecosystem for the MLLM/LMM community, with countless researchers contributing their models and benchmarks to raise the whole edifice. I was fortunate to be one of them: our series of video-LMM works (MovieChat, AuroraCap, VDC)…
Throughout my journey in developing multimodal models, I’ve always wanted a framework that lets me plug & play modality encoders/decoders on top of an auto-regressive LLM. I want to prototype fast, try new architectures, and have my demo files scale effortlessly — with full…
I have one PhD intern opening to do research as a part of a model training effort at the FAIR CodeGen team (latest: Code World Model). If interested, email me directly and apply at metacareers.com/jobs/214557081…
United States Xu hướng
- 1. Bills 115K posts
- 2. Josh Allen 7,597 posts
- 3. Jaxson Dart 6,406 posts
- 4. Jonathan Taylor 21.4K posts
- 5. Dolphins 21.6K posts
- 6. Ravens 25.5K posts
- 7. Falcons 31.6K posts
- 8. Colts 53.3K posts
- 9. Henderson 8,477 posts
- 10. Browns 24.4K posts
- 11. Diggs 7,677 posts
- 12. Joe Brady 2,197 posts
- 13. Kyle Williams 5,737 posts
- 14. Justin Fields 2,158 posts
- 15. #Bears 3,879 posts
- 16. Daniel Jones 10.1K posts
- 17. Drake Maye 7,346 posts
- 18. Penix 11.1K posts
- 19. Beane 3,019 posts
- 20. #NYGiants 2,371 posts
Something went wrong.
Something went wrong.