/MachineLearning
@slashML
You might like
Our paper "Vision Transformers Don't Need Trained Registers" will appear as a Spotlight at NeurIPS 2025! We uncover the mechanism behind high-norm tokens and attention sinks in ViTs, propose a training-free fix, and recently added an analytical model -- more on that below. ⬇️
Vision transformers have high-norm outliers that hurt performance and distort attention. While prior work removed them by retraining with “register” tokens, we find the mechanism behind outliers and make registers at ✨test-time✨—giving clean features and better performance! 🧵
Introducing Petri Dish Neural Cellular Automata (PD-NCA) 🦠 The search for open-ended complexification, a north star of Artificial Life (ALife) simulations, is a question that fascinates us deeply. In this work we explore the role of continual adaptation in ALife simulation,…
1.1M tokens/sec on just one rack of GB300 GPUs in our Azure fleet. An industry record made possible by our longstanding co-innovation with NVIDIA and expertise of running AI at production scale! techcommunity.microsoft.com/blog/azurehigh…
All these laptops are shipping with NPUs. Intel + AMD + Apple + Qualcomm all have one, and they are often heavily featured in the marketing material. Is anyone using them? If so, how?
AI research is overfitting to hardware
I was puzzled by why their paper claims "bfloat16" training crashes -- since we trained for 100,000 GPU hours and 7K+ training steps for both dense and MoEs in the ScaleRL paper stably without any crashes. I think it matters what kind of GPUs they used -- they mention in the…
So is the formula to just name the most famous institutions and call it an X paper? Neither the first or last author are from Anthropic or Stanford. I get that reputation matters for publicity but it does seem a little disrespectful
New Stanford+Anthropic paper shows long step-by-step prompts can break model safety and trigger harmful answers. 😟 Long reasoning can quietly neutralize safety checks that people assume are working. The trick adds a benign puzzle and long reasoning before the harmful ask, plus…
🚀Excited to share our new work! 💊Problem: The BF16 precision causes a large training-inference mismatch, leading to unstable RL training. 💡Solution: Just switch to FP16. 🎯That's it. 📰Paper: arxiv.org/pdf/2510.26788 ⭐️Code: github.com/sail-sg/Precis…
FP16 can have a smaller training-inference gap compared to BFloat16, thus fits better for RL. Even the difference between RL algorithms vanishes once FP16 is adopted. Surprising!
The trick below to align tokens with different tokenizers is a cute idea -- this allows you to run on-policy distillation with teacher logprobs for sampled tokens even when student and teacher belong to different model families (e.g., Qwen vs Llama). There's more we need to do…
HRM-Agent: Using the Hierarchical Reasoning Model in Reinforcement Learning Paper: arxiv.org/abs/2510.22832 The Hierarchical Reasoning Model (HRM) has impressive reasoning abilities given its small size, but has only been applied to supervised, static, fully-observable problems.
OpenAI successfully converts from non-profit to un-profitable for-profit.
We completed our recapitalization. The non-profit, the OpenAI Foundation, is now one of the best resourced philanthropies ever, with equity valued at ~$130B. It continues to control the OpenAI for-profit, which is now a public benefit corporation. openai.com/index/built-to…
Geoffrey Hinton says I'm more optimistic now, not because we'll control AI, but because we might not need to "don't try to dominate superintelligence; design it to care, like a mother wired to protect her child" Control through attachment, not power. we want AI to be like that
Spiking Neural Network from scratch achieves 8% accuracy. no backpropagation or SGD I created a genetic hyper parameter optimizer and it now, on average, can get 8% accuracy which is ~3% above chance Link to source code with a detailed video and markdown explanations in comment…
I built a biologically inspired spiking neural network from scratch and it learned with %5 accuracy to do addition :) There is no backpropagation, no artificial loss functions - just spikes, synapses, and dopamine-like reward signals. it uses STDP -> "Spike-Timing-Dependent…
I disagree with @ylecun on this. We have a pretty good idea at Tesla on how we can make general humanoids a reality very quickly. Funny anecdote: Yann was advising me to launch what became the first production vision based deep neural network at Google. His feedback: use convs,…
Meta's Chief AI Scientist Yann LeCun offers a critical take on the humanoid robot boom. Speaking at MIT, LeCun claimed the "big secret" of the industry is that current companies "have no idea" how to make their robots "smart enough to be generally useful." He argues that while…
Haha. I am afraid people interpreted my “delete tokenizer” as “use bytes directly without BPE”, the issue is you *still* need bytes encoding arbitrariness even for that! Pixels is the only way. Just like humans. It is written. If GPT-10 uses utf8 at the input I will eat a shoe.
Sakana AI’s CTO says he’s ‘absolutely sick’ of transformers, the tech that powers every major AI model “You should only do the research that wouldn’t happen if you weren’t doing it.” (@thisismyhat) 🧠 @YesThisIsLion venturebeat.com/ai/sakana-ais-…
Meta laid off 600 people from its Superintelligence Lab today. Many FAIR researchers, including FAIR Research Scientist Director Yuandong Tian, were affected. I think Yann Lecun will leave soon. Maybe I should raise $2B and start a new frontier lab with these folks.
ARC Prize announces all validated scores on ARC-AGI We have not verified MythWorx's 100% claim in their recent fundraise $100M val press release We would be open to verifying their score (assuming it passes the testing policy) for the founder and their investors
Introducing the compact, dense versions of Qwen3-VL — now available in 4B and 8B pairs, each with both Instruct and Thinking variants. ✅ Lower VRAM usage ✅ Full Qwen3-VL capabilities retained ✅ Strong performance across the board Despite their size, they outperform models…
Today we're sharing the next phase of Reflection. We're building frontier open intelligence accessible to all. We've assembled an extraordinary AI team, built a frontier LLM training stack, and raised $2 billion. Why Open Intelligence Matters Technological and scientific…
United States Trends
- 1. #SmackDown 35K posts
- 2. Caleb Wilson 4,219 posts
- 3. Giulia 11.6K posts
- 4. #BostonBlue 2,398 posts
- 5. #OPLive 1,303 posts
- 6. Rockets 18.9K posts
- 7. #TheLastDriveIn 1,769 posts
- 8. Lash Legend 4,307 posts
- 9. Reed 24.4K posts
- 10. Supreme Court 164K posts
- 11. Chelsea Green 5,000 posts
- 12. #Dateline N/A
- 13. Sengun 3,828 posts
- 14. Darryn Peterson 2,118 posts
- 15. Kansas 23.1K posts
- 16. Harrison Barnes N/A
- 17. Northwestern 3,947 posts
- 18. End of 3rd 1,411 posts
- 19. Nia Jax 2,741 posts
- 20. Dizzy 12.6K posts
You might like
-
Andrew Ng
@AndrewYNg -
Hugging Face
@huggingface -
AI at Meta
@AIatMeta -
Berkeley AI Research
@berkeley_ai -
Stanford AI Lab
@StanfordAILab -
PyTorch
@PyTorch -
Ian Goodfellow
@goodfellow_ian -
DeepLearning.AI
@DeepLearningAI -
François Chollet
@fchollet -
Machine Learning FLX
@machinelearnflx -
Machine Learning Mastery
@TeachTheMachine -
MIT CSAIL
@MIT_CSAIL -
Stanford NLP Group
@stanfordnlp -
Kaggle
@kaggle -
Yann LeCun
@ylecun
Something went wrong.
Something went wrong.