slashML's profile picture.

/MachineLearning

@slashML

/MachineLearning reposted

Our paper "Vision Transformers Don't Need Trained Registers" will appear as a Spotlight at NeurIPS 2025! We uncover the mechanism behind high-norm tokens and attention sinks in ViTs, propose a training-free fix, and recently added an analytical model -- more on that below. ⬇️

Vision transformers have high-norm outliers that hurt performance and distort attention. While prior work removed them by retraining with “register” tokens, we find the mechanism behind outliers and make registers at ✨test-time✨—giving clean features and better performance! 🧵

nickhjiang's tweet image. Vision transformers have high-norm outliers that hurt performance and distort attention. While prior work removed them by retraining with “register” tokens, we find the mechanism behind outliers and make registers at ✨test-time✨—giving clean features and better performance! 🧵


/MachineLearning reposted

Introducing Petri Dish Neural Cellular Automata (PD-NCA) 🦠 The search for open-ended complexification, a north star of Artificial Life (ALife) simulations, is a question that fascinates us deeply. In this work we explore the role of continual adaptation in ALife simulation,…


/MachineLearning reposted

1.1M tokens/sec on just one rack of GB300 GPUs in our Azure fleet. An industry record made possible by our longstanding co-innovation with NVIDIA and expertise of running AI at production scale! techcommunity.microsoft.com/blog/azurehigh…


/MachineLearning reposted

All these laptops are shipping with NPUs. Intel + AMD + Apple + Qualcomm all have one, and they are often heavily featured in the marketing material. Is anyone using them? If so, how?


AI research is overfitting to hardware

I was puzzled by why their paper claims "bfloat16" training crashes -- since we trained for 100,000 GPU hours and 7K+ training steps for both dense and MoEs in the ScaleRL paper stably without any crashes. I think it matters what kind of GPUs they used -- they mention in the…

agarwl_'s tweet image. I was puzzled by why their paper claims "bfloat16" training crashes -- since we trained for 100,000 GPU hours and 7K+ training steps for both dense and MoEs in the ScaleRL paper stably without any crashes. 

I think it matters what kind of GPUs they used -- they mention in the…


/MachineLearning reposted

So is the formula to just name the most famous institutions and call it an X paper? Neither the first or last author are from Anthropic or Stanford. I get that reputation matters for publicity but it does seem a little disrespectful

New Stanford+Anthropic paper shows long step-by-step prompts can break model safety and trigger harmful answers. 😟 Long reasoning can quietly neutralize safety checks that people assume are working. The trick adds a benign puzzle and long reasoning before the harmful ask, plus…

rohanpaul_ai's tweet image. New Stanford+Anthropic paper shows long step-by-step prompts can break model safety and trigger harmful answers. 😟

Long reasoning can quietly neutralize safety checks that people assume are working.

The trick adds a benign puzzle and long reasoning before the harmful ask, plus…


/MachineLearning reposted

🚀Excited to share our new work! 💊Problem: The BF16 precision causes a large training-inference mismatch, leading to unstable RL training. 💡Solution: Just switch to FP16. 🎯That's it. 📰Paper: arxiv.org/pdf/2510.26788 ⭐️Code: github.com/sail-sg/Precis…

QPHutu's tweet image. 🚀Excited to share our new work!

💊Problem: The BF16 precision causes a large training-inference mismatch, leading to unstable RL training.

💡Solution: Just switch to FP16.

🎯That's it.

📰Paper: arxiv.org/pdf/2510.26788
⭐️Code: github.com/sail-sg/Precis…
QPHutu's tweet image. 🚀Excited to share our new work!

💊Problem: The BF16 precision causes a large training-inference mismatch, leading to unstable RL training.

💡Solution: Just switch to FP16.

🎯That's it.

📰Paper: arxiv.org/pdf/2510.26788
⭐️Code: github.com/sail-sg/Precis…

/MachineLearning reposted

FP16 can have a smaller training-inference gap compared to BFloat16, thus fits better for RL. Even the difference between RL algorithms vanishes once FP16 is adopted. Surprising!

rosinality's tweet image. FP16 can have a smaller training-inference gap compared to BFloat16, thus fits better for RL. Even the difference between RL algorithms vanishes once FP16 is adopted. Surprising!

/MachineLearning reposted

The trick below to align tokens with different tokenizers is a cute idea -- this allows you to run on-policy distillation with teacher logprobs for sampled tokens even when student and teacher belong to different model families (e.g., Qwen vs Llama). There's more we need to do…

agarwl_'s tweet image. The trick below to align tokens with different tokenizers is a cute idea -- this allows you to run on-policy distillation with teacher logprobs for sampled tokens even when student and teacher belong to different model families (e.g., Qwen vs Llama).

There's more we need to do…

/MachineLearning reposted

HRM-Agent: Using the Hierarchical Reasoning Model in Reinforcement Learning Paper: arxiv.org/abs/2510.22832 The Hierarchical Reasoning Model (HRM) has impressive reasoning abilities given its small size, but has only been applied to supervised, static, fully-observable problems.

CausalWizard's tweet image. HRM-Agent: Using the Hierarchical Reasoning Model in Reinforcement Learning
Paper: arxiv.org/abs/2510.22832

The Hierarchical Reasoning Model (HRM) has impressive reasoning abilities given its small size, but has only been applied to supervised, static, fully-observable problems.

OpenAI successfully converts from non-profit to un-profitable for-profit.

We completed our recapitalization. The non-profit, the OpenAI Foundation, is now one of the best resourced philanthropies ever, with equity valued at ~$130B. It continues to control the OpenAI for-profit, which is now a public benefit corporation. openai.com/index/built-to…



/MachineLearning reposted

Geoffrey Hinton says I'm more optimistic now, not because we'll control AI, but because we might not need to "don't try to dominate superintelligence; design it to care, like a mother wired to protect her child" Control through attachment, not power. we want AI to be like that


/MachineLearning reposted

Spiking Neural Network from scratch achieves 8% accuracy. no backpropagation or SGD I created a genetic hyper parameter optimizer and it now, on average, can get 8% accuracy which is ~3% above chance Link to source code with a detailed video and markdown explanations in comment…

I built a biologically inspired spiking neural network from scratch and it learned with %5 accuracy to do addition :) There is no backpropagation, no artificial loss functions - just spikes, synapses, and dopamine-like reward signals. it uses STDP -> "Spike-Timing-Dependent…



/MachineLearning reposted

I disagree with @ylecun on this. We have a pretty good idea at Tesla on how we can make general humanoids a reality very quickly. Funny anecdote: Yann was advising me to launch what became the first production vision based deep neural network at Google. His feedback: use convs,…

Meta's Chief AI Scientist Yann LeCun offers a critical take on the humanoid robot boom. Speaking at MIT, LeCun claimed the "big secret" of the industry is that current companies "have no idea" how to make their robots "smart enough to be generally useful." He argues that while…



/MachineLearning reposted

Haha. I am afraid people interpreted my “delete tokenizer” as “use bytes directly without BPE”, the issue is you *still* need bytes encoding arbitrariness even for that! Pixels is the only way. Just like humans. It is written. If GPT-10 uses utf8 at the input I will eat a shoe.


/MachineLearning reposted

Sakana AI’s CTO says he’s ‘absolutely sick’ of transformers, the tech that powers every major AI model “You should only do the research that wouldn’t happen if you weren’t doing it.” (@thisismyhat) 🧠 @YesThisIsLion venturebeat.com/ai/sakana-ais-…


/MachineLearning reposted

Meta laid off 600 people from its Superintelligence Lab today. Many FAIR researchers, including FAIR Research Scientist Director Yuandong Tian, were affected. I think Yann Lecun will leave soon. Maybe I should raise $2B and start a new frontier lab with these folks.


/MachineLearning reposted

ARC Prize announces all validated scores on ARC-AGI We have not verified MythWorx's 100% claim in their recent fundraise $100M val press release We would be open to verifying their score (assuming it passes the testing policy) for the founder and their investors

GregKamradt's tweet image. ARC Prize announces all validated scores on ARC-AGI

We have not verified MythWorx's 100% claim in their recent fundraise $100M val press release

We would be open to verifying their score (assuming it passes the testing policy) for the founder and their investors

/MachineLearning reposted

Introducing the compact, dense versions of Qwen3-VL — now available in 4B and 8B pairs, each with both Instruct and Thinking variants. ✅ Lower VRAM usage ✅ Full Qwen3-VL capabilities retained ✅ Strong performance across the board Despite their size, they outperform models…

Alibaba_Qwen's tweet image. Introducing the compact, dense versions of Qwen3-VL — now available in 4B and 8B pairs, each with both Instruct and Thinking variants.

✅ Lower VRAM usage
✅ Full Qwen3-VL capabilities retained
✅ Strong performance across the board

Despite their size, they outperform models…

/MachineLearning reposted

Today we're sharing the next phase of Reflection. We're building frontier open intelligence accessible to all. We've assembled an extraordinary AI team, built a frontier LLM training stack, and raised $2 billion. Why Open Intelligence Matters Technological and scientific…


Loading...

Something went wrong.


Something went wrong.