nice
diffusion transformers have come a long way, but most still lean on the old 2021 sd-vae for their latent space. that causes a few big issues: 1. outdated backbones make the architecture more complex than it needs to be. the sd-vae runs at around 450 gflops, while a simple ViT-B…

three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right. today, we introduce Representation Autoencoders (RAE). >> Retire VAEs. Use RAEs. 👇(1/n)

btw this discovery is by Mehtaab Sawhney (msawhney in the picture) mit.edu/~msawhney who has been doing lots of AWESOME works in combinatorics for many years now!
mit.edu
Mehtaab Sawhney — Clay Research Fellow & Assistant Professor at Columbia University
Mehtaab Sawhney — Clay Research Fellow and Assistant Professor at Columbia University. Research in combinatorics, probability, analytic number theory, and theoretical computer science.
gpt5-pro is superhuman at literature search: it just solved Erdos Problem #339 (listed as open in the official database erdosproblems.com/forum/thread/3…) by realizing that it had actually been solved 20 years ago h/t @MarkSellke for pointing this out to me!

Things that have happened since jan 2025: -Gemini 2.5 deepthink IMO got a gold medal on the IMO - unreleased openai model got a gold medal on the imo and ioi and a perfect score on the icpc - same unreleased openai model coded for 9 hours autonomously and got second place in the…
Matt Walsh is now "feeling the AGI." Meanwhile there's been ~zero change in real capabilities since we hit Midwit Wall (120 IQ) in January 2025. It's over. Send all these AI scams to zero prompto. 🚨🐻📉

🚀 NorMuon: Muon + Neuron-wise adaptive learning rates: +21.7% training efficiency vs Adam, +11.3% vs Muon on 1.1B pretrain. 🚀 Distributed Normuon: A highly efficient FSDP2 implementation. Paper 👉 arxiv.org/abs/2510.05491 #LLM #AI #DeepLearning #Optimizer


Ok I was unfair to @jm_alexia. had no time to read the TRM paper, I should have. It's a very good faith attempt to rescue the idea of enabling huge effective depth for a tiny model, without any shaky premises or unwarranted approximations of HRM. It may be a real, big finding.




TRM is one of the best papers I've read in the past years - it truly shows the unfiltered process of a researcher: 1) See awesome paper, get hyped about it 2) Read it - looks cool 3) Run it - doesn't work 4) Find glaring mistakes 5) Fix the issues x.com/jm_alexia/stat…
Steve Jobs predicted ChatGPT in 1985.
🚨 GPT-5 Pro just achieved 2 BIG results in Mathematics today! — Solved a problem unsolved by LLMs and only 60 humans prior — Solved an open problem in real analysis in computer science Progress in AI seems very gradual, and then all of a sudden.

GPT-5 Pro just disproved a conjecture on the list of the Simon's Foundation's "Open Problems" list. Very impressive. I wonder where the goal posts will move now? CC @sebastienbubeck HT @arjunmanrai
GPT-5 Pro found a counterexample to the NICD-with-erasures majority optimality (Simons list, p.25). simons.berkeley.edu/sites/default/… At p=0.4, n=5, f(x) = sign(x_1-3x_2+x_3-x_4+3x_5) gives E|f(x)|=0.43024 vs best majority 0.42904.


banger blogpost

Even with full-batch gradients, DL optimizers defy classical optimization theory, as they operate at the *edge of stability.* With @alex_damian_, we introduce "central flows": a theoretical tool to analyze these dynamics that makes accurate quantitative predictions on real NNs.
what a beautiful theory!

Even with full-batch gradients, DL optimizers defy classical optimization theory, as they operate at the *edge of stability.* With @alex_damian_, we introduce "central flows": a theoretical tool to analyze these dynamics that makes accurate quantitative predictions on real NNs.
Even with full-batch gradients, DL optimizers defy classical optimization theory, as they operate at the *edge of stability.* With @alex_damian_, we introduce "central flows": a theoretical tool to analyze these dynamics that makes accurate quantitative predictions on real NNs.
The unicorn challenge got upgraded
LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.…

Initial thoughts looking at these scores: the model is effectively GPT-5 level (while being 40% more expensive) - Big Terminal Bench gains - Similar to GPT5, it has huge gains on TauBench - Telecom specifically. - SWEBench is starting to plateau

Introducing Claude Sonnet 4.5—the best coding model in the world. It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math.

I was never fully convinced that there's any hard ceiling for self-attention in transformers if one carefully applies classical remedies like multi-stage retrieval. Great job, whale!
🚀 Introducing DeepSeek-V3.2-Exp — our latest experimental model! ✨ Built on V3.1-Terminus, it debuts DeepSeek Sparse Attention(DSA) for faster, more efficient training & inference on long context. 👉 Now live on App, Web, and API. 💰 API prices cut by 50%+! 1/n
🚀 Introducing DeepSeek-V3.2-Exp — our latest experimental model! ✨ Built on V3.1-Terminus, it debuts DeepSeek Sparse Attention(DSA) for faster, more efficient training & inference on long context. 👉 Now live on App, Web, and API. 💰 API prices cut by 50%+! 1/n
United States Xu hướng
- 1. $ZOOZ N/A
- 2. #2025MAMAVOTE 648K posts
- 3. Good Thursday 24.7K posts
- 4. Mila 18.2K posts
- 5. #ThursdayThoughts 1,510 posts
- 6. Happy Friday Eve N/A
- 7. #thursdayvibes 2,396 posts
- 8. Brevis ZK 144K posts
- 9. #thursdaymotivation 2,230 posts
- 10. Deport Harry Sisson 15.4K posts
- 11. Ninja Gaiden 15.5K posts
- 12. Deloitte 11.1K posts
- 13. Tomonobu Itagaki 10.4K posts
- 14. Dead or Alive 13.7K posts
- 15. #2025MAMAAWARDS 233K posts
- 16. DuPont 2,320 posts
- 17. Jennifer Welch 5,012 posts
- 18. Happy 60th N/A
- 19. Bernie 41.2K posts
- 20. Talus Labs 15.5K posts
Something went wrong.
Something went wrong.