CodeTitanium's profile picture.

L

@CodeTitanium

L đã đăng lại

nice

diffusion transformers have come a long way, but most still lean on the old 2021 sd-vae for their latent space. that causes a few big issues: 1. outdated backbones make the architecture more complex than it needs to be. the sd-vae runs at around 450 gflops, while a simple ViT-B…

sainingxie's tweet image. diffusion transformers have come a long way, but most still lean on the old 2021 sd-vae for their latent space.

that causes a few big issues:
1. outdated backbones make the architecture more complex than it needs to be. the sd-vae runs at around 450 gflops, while a simple ViT-B…


L đã đăng lại

three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right. today, we introduce Representation Autoencoders (RAE). >> Retire VAEs. Use RAEs. 👇(1/n)

sainingxie's tweet image. three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right.

today, we introduce Representation Autoencoders (RAE).

>> Retire VAEs. Use RAEs. 👇(1/n)

L đã đăng lại

btw this discovery is by Mehtaab Sawhney (msawhney in the picture) mit.edu/~msawhney who has been doing lots of AWESOME works in combinatorics for many years now!

mit.edu

Mehtaab Sawhney — Clay Research Fellow & Assistant Professor at Columbia University

Mehtaab Sawhney — Clay Research Fellow and Assistant Professor at Columbia University. Research in combinatorics, probability, analytic number theory, and theoretical computer science.

gpt5-pro is superhuman at literature search: it just solved Erdos Problem #339 (listed as open in the official database erdosproblems.com/forum/thread/3…) by realizing that it had actually been solved 20 years ago h/t @MarkSellke for pointing this out to me!

SebastienBubeck's tweet image. gpt5-pro is superhuman at literature search: 

it just solved Erdos Problem #339 (listed as open in the official database erdosproblems.com/forum/thread/3…) by realizing that it had actually been solved 20 years ago

h/t @MarkSellke for pointing this out to me!


L đã đăng lại

Things that have happened since jan 2025: -Gemini 2.5 deepthink IMO got a gold medal on the IMO - unreleased openai model got a gold medal on the imo and ioi and a perfect score on the icpc - same unreleased openai model coded for 9 hours autonomously and got second place in the…

Matt Walsh is now "feeling the AGI." Meanwhile there's been ~zero change in real capabilities since we hit Midwit Wall (120 IQ) in January 2025. It's over. Send all these AI scams to zero prompto. 🚨🐻📉

powerfultakes's tweet image. Matt Walsh is now "feeling the AGI."

Meanwhile there's been ~zero change in real capabilities since we hit Midwit Wall (120 IQ) in January 2025.

It's over.

Send all these AI scams to zero prompto. 🚨🐻📉


L đã đăng lại

🚀 NorMuon: Muon + Neuron-wise adaptive learning rates: +21.7% training efficiency vs Adam, +11.3% vs Muon on 1.1B pretrain. 🚀 Distributed Normuon: A highly efficient FSDP2 implementation. Paper 👉 arxiv.org/abs/2510.05491 #LLM #AI #DeepLearning #Optimizer

tourzhao's tweet image. 🚀 NorMuon: Muon + Neuron-wise adaptive learning rates: +21.7% training efficiency vs Adam, +11.3% vs Muon on 1.1B pretrain.

🚀 Distributed Normuon: A highly efficient FSDP2 implementation.

Paper 👉 arxiv.org/abs/2510.05491
#LLM #AI #DeepLearning #Optimizer
tourzhao's tweet image. 🚀 NorMuon: Muon + Neuron-wise adaptive learning rates: +21.7% training efficiency vs Adam, +11.3% vs Muon on 1.1B pretrain.

🚀 Distributed Normuon: A highly efficient FSDP2 implementation.

Paper 👉 arxiv.org/abs/2510.05491
#LLM #AI #DeepLearning #Optimizer

L đã đăng lại

Ok I was unfair to @jm_alexia. had no time to read the TRM paper, I should have. It's a very good faith attempt to rescue the idea of enabling huge effective depth for a tiny model, without any shaky premises or unwarranted approximations of HRM. It may be a real, big finding.

teortaxesTex's tweet image. Ok I was unfair to @jm_alexia. had no time to read the TRM paper, I should have. It's a very good faith attempt to rescue the idea of enabling huge effective depth for a tiny model, without any shaky premises or unwarranted approximations of HRM. It may be a real, big finding.
teortaxesTex's tweet image. Ok I was unfair to @jm_alexia. had no time to read the TRM paper, I should have. It's a very good faith attempt to rescue the idea of enabling huge effective depth for a tiny model, without any shaky premises or unwarranted approximations of HRM. It may be a real, big finding.
teortaxesTex's tweet image. Ok I was unfair to @jm_alexia. had no time to read the TRM paper, I should have. It's a very good faith attempt to rescue the idea of enabling huge effective depth for a tiny model, without any shaky premises or unwarranted approximations of HRM. It may be a real, big finding.
teortaxesTex's tweet image. Ok I was unfair to @jm_alexia. had no time to read the TRM paper, I should have. It's a very good faith attempt to rescue the idea of enabling huge effective depth for a tiny model, without any shaky premises or unwarranted approximations of HRM. It may be a real, big finding.

TRM is one of the best papers I've read in the past years - it truly shows the unfiltered process of a researcher: 1) See awesome paper, get hyped about it 2) Read it - looks cool 3) Run it - doesn't work 4) Find glaring mistakes 5) Fix the issues x.com/jm_alexia/stat…



L đã đăng lại

Steve Jobs predicted ChatGPT in 1985.


L đã đăng lại
miniapeur's tweet image.

L đã đăng lại

🚨 GPT-5 Pro just achieved 2 BIG results in Mathematics today! — Solved a problem unsolved by LLMs and only 60 humans prior — Solved an open problem in real analysis in computer science Progress in AI seems very gradual, and then all of a sudden.

deedydas's tweet image. 🚨 GPT-5 Pro just achieved 2 BIG results in Mathematics today!

— Solved a problem unsolved by LLMs and only 60 humans prior
— Solved an open problem in real analysis in computer science

Progress in AI seems very gradual, and then all of a sudden.

L đã đăng lại

GPT-5 Pro just disproved a conjecture on the list of the Simon's Foundation's "Open Problems" list. Very impressive. I wonder where the goal posts will move now? CC @sebastienbubeck HT @arjunmanrai

GPT-5 Pro found a counterexample to the NICD-with-erasures majority optimality (Simons list, p.25). simons.berkeley.edu/sites/default/… At p=0.4, n=5, f(x) = sign(x_1-3x_2+x_3-x_4+3x_5) gives E|f(x)|=0.43024 vs best majority 0.42904.

PI010101's tweet image. GPT-5 Pro found a counterexample to the NICD-with-erasures majority optimality (Simons list, p.25).
simons.berkeley.edu/sites/default/…

At p=0.4, n=5, f(x) = sign(x_1-3x_2+x_3-x_4+3x_5) gives E|f(x)|=0.43024 vs best majority 0.42904.
PI010101's tweet image. GPT-5 Pro found a counterexample to the NICD-with-erasures majority optimality (Simons list, p.25).
simons.berkeley.edu/sites/default/…

At p=0.4, n=5, f(x) = sign(x_1-3x_2+x_3-x_4+3x_5) gives E|f(x)|=0.43024 vs best majority 0.42904.


L đã đăng lại

banger blogpost

cloneofsimo's tweet image. banger blogpost

Even with full-batch gradients, DL optimizers defy classical optimization theory, as they operate at the *edge of stability.* With @alex_damian_, we introduce "central flows": a theoretical tool to analyze these dynamics that makes accurate quantitative predictions on real NNs.



L đã đăng lại

what a beautiful theory!

YouJiacheng's tweet image. what a beautiful theory!

Even with full-batch gradients, DL optimizers defy classical optimization theory, as they operate at the *edge of stability.* With @alex_damian_, we introduce "central flows": a theoretical tool to analyze these dynamics that makes accurate quantitative predictions on real NNs.



L đã đăng lại

Even with full-batch gradients, DL optimizers defy classical optimization theory, as they operate at the *edge of stability.* With @alex_damian_, we introduce "central flows": a theoretical tool to analyze these dynamics that makes accurate quantitative predictions on real NNs.


L đã đăng lại

The unicorn challenge got upgraded


L đã đăng lại

Sound on.


L đã đăng lại

LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.…

thinkymachines's tweet image. LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.…

L đã đăng lại

Initial thoughts looking at these scores: the model is effectively GPT-5 level (while being 40% more expensive) - Big Terminal Bench gains - Similar to GPT5, it has huge gains on TauBench - Telecom specifically. - SWEBench is starting to plateau

nrehiew_'s tweet image. Initial thoughts looking at these scores: the model is effectively GPT-5 level (while being 40% more expensive)
- Big Terminal Bench gains 
- Similar to GPT5, it has huge gains on TauBench - Telecom specifically. 
- SWEBench is starting to plateau

Introducing Claude Sonnet 4.5—the best coding model in the world. It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math.

claudeai's tweet image. Introducing Claude Sonnet 4.5—the best coding model in the world.

It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math.


L đã đăng lại

Introducing Claude Sonnet 4.5—the best coding model in the world. It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math.

claudeai's tweet image. Introducing Claude Sonnet 4.5—the best coding model in the world.

It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math.

L đã đăng lại

I was never fully convinced that there's any hard ceiling for self-attention in transformers if one carefully applies classical remedies like multi-stage retrieval. Great job, whale!

🚀 Introducing DeepSeek-V3.2-Exp — our latest experimental model! ✨ Built on V3.1-Terminus, it debuts DeepSeek Sparse Attention(DSA) for faster, more efficient training & inference on long context. 👉 Now live on App, Web, and API. 💰 API prices cut by 50%+! 1/n



L đã đăng lại

🚀 Introducing DeepSeek-V3.2-Exp — our latest experimental model! ✨ Built on V3.1-Terminus, it debuts DeepSeek Sparse Attention(DSA) for faster, more efficient training & inference on long context. 👉 Now live on App, Web, and API. 💰 API prices cut by 50%+! 1/n


United States Xu hướng

Loading...

Something went wrong.


Something went wrong.