/MachineLearning

@slashML

Education

Cloud

Joined December 2016

16KPosts 124KFollowers 1Following

You might like

@AndrewYNg

@huggingface

@AIatMeta

@berkeley_ai

@StanfordAILab

@PyTorch

@goodfellow_ian

@DeepLearningAI

@fchollet

@machinelearnflx

@TeachTheMachine

@MIT_CSAIL

@stanfordnlp

@kaggle

@ylecun

/MachineLearning reposted

Amil Dravid

@_AmilDravid

Nov 4

Our paper "Vision Transformers Don't Need Trained Registers" will appear as a Spotlight at NeurIPS 2025! We uncover the mechanism behind high-norm tokens and attention sinks in ViTs, propose a training-free fix, and recently added an analytical model -- more on that below. ⬇️

Nick Jiang

@nickhjiang

Jun 10

Vision transformers have high-norm outliers that hurt performance and distort attention. While prior work removed them by retraining with “register” tokens, we find the mechanism behind outliers and make registers at ✨test-time✨—giving clean features and better performance! 🧵

nickhjiang's tweet image. Vision transformers have high-norm outliers that hurt performance and distort attention. While prior work removed them by retraining with “register” tokens, we find the mechanism behind outliers and make registers at ✨test-time✨—giving clean features and better performance! 🧵

/MachineLearning reposted

Sakana AI

@SakanaAILabs

Nov 5

Introducing Petri Dish Neural Cellular Automata (PD-NCA) 🦠 The search for open-ended complexification, a north star of Artificial Life (ALife) simulations, is a question that fascinates us deeply. In this work we explore the role of continual adaptation in ALife simulation,…

/MachineLearning reposted

Satya Nadella

@satyanadella

Nov 4

1.1M tokens/sec on just one rack of GB300 GPUs in our Azure fleet. An industry record made possible by our longstanding co-innovation with NVIDIA and expertise of running AI at production scale! techcommunity.microsoft.com/blog/azurehigh…

satyanadella's tweet card. Azure ND GB300 v6 Virtual Machines with NVIDIA GB300 NVL72 rack-scale systems achieve unprecedented performance of 1,100,000 tokens/s on Llama2 70B...

Breaking the Million-Token Barrier

Source: techcommunity.microsoft.com

/MachineLearning reposted

the tiny corp

@__tinygrad__

Nov 3

All these laptops are shipping with NPUs. Intel + AMD + Apple + Qualcomm all have one, and they are often heavily featured in the marketing material. Is anyone using them? If so, how?

/MachineLearning

@slashML

Nov 2

AI research is overfitting to hardware

Rishabh Agarwal

@agarwl_

Nov 1

I was puzzled by why their paper claims "bfloat16" training crashes -- since we trained for 100,000 GPU hours and 7K+ training steps for both dense and MoEs in the ScaleRL paper stably without any crashes. I think it matters what kind of GPUs they used -- they mention in the…

agarwl_'s tweet image. I was puzzled by why their paper claims "bfloat16" training crashes -- since we trained for 100,000 GPU hours and 7K+ training steps for both dense and MoEs in the ScaleRL paper stably without any crashes.

I think it matters what kind of GPUs they used -- they mention in the…

/MachineLearning reposted

Lisa Dunlap

@lisabdunlap

Nov 2

So is the formula to just name the most famous institutions and call it an X paper? Neither the first or last author are from Anthropic or Stanford. I get that reputation matters for publicity but it does seem a little disrespectful

Rohan Paul

@rohanpaul_ai

Nov 2

New Stanford+Anthropic paper shows long step-by-step prompts can break model safety and trigger harmful answers. 😟 Long reasoning can quietly neutralize safety checks that people assume are working. The trick adds a benign puzzle and long reasoning before the harmful ask, plus…

rohanpaul_ai's tweet image. New Stanford+Anthropic paper shows long step-by-step prompts can break model safety and trigger harmful answers. 😟

Long reasoning can quietly neutralize safety checks that people assume are working.

The trick adds a benign puzzle and long reasoning before the harmful ask, plus…

/MachineLearning reposted

Penghui Qi

@QPHutu

Oct 31

🚀Excited to share our new work! 💊Problem: The BF16 precision causes a large training-inference mismatch, leading to unstable RL training. 💡Solution: Just switch to FP16. 🎯That's it. 📰Paper: arxiv.org/pdf/2510.26788 ⭐️Code: github.com/sail-sg/Precis…

QPHutu's tweet image. 🚀Excited to share our new work!

💊Problem: The BF16 precision causes a large training-inference mismatch, leading to unstable RL training.

💡Solution: Just switch to FP16.

🎯That's it.

📰Paper: arxiv.org/pdf/2510.26788
⭐️Code: github.com/sail-sg/Precis…

/MachineLearning reposted

Rosinality

@rosinality

Oct 31

FP16 can have a smaller training-inference gap compared to BFloat16, thus fits better for RL. Even the difference between RL algorithms vanishes once FP16 is adopted. Surprising!

rosinality's tweet image. FP16 can have a smaller training-inference gap compared to BFloat16, thus fits better for RL. Even the difference between RL algorithms vanishes once FP16 is adopted. Surprising!

/MachineLearning reposted

Rishabh Agarwal

@agarwl_

Oct 30

The trick below to align tokens with different tokenizers is a cute idea -- this allows you to run on-policy distillation with teacher logprobs for sampled tokens even when student and teacher belong to different model families (e.g., Qwen vs Llama). There's more we need to do…

agarwl_'s tweet image. The trick below to align tokens with different tokenizers is a cute idea -- this allows you to run on-policy distillation with teacher logprobs for sampled tokens even when student and teacher belong to different model families (e.g., Qwen vs Llama).

There's more we need to do…

/MachineLearning reposted

Causal Wizard

@CausalWizard

Oct 29

HRM-Agent: Using the Hierarchical Reasoning Model in Reinforcement Learning Paper: arxiv.org/abs/2510.22832 The Hierarchical Reasoning Model (HRM) has impressive reasoning abilities given its small size, but has only been applied to supervised, static, fully-observable problems.

CausalWizard's tweet image. HRM-Agent: Using the Hierarchical Reasoning Model in Reinforcement Learning
Paper: arxiv.org/abs/2510.22832

The Hierarchical Reasoning Model (HRM) has impressive reasoning abilities given its small size, but has only been applied to supervised, static, fully-observable problems.

/MachineLearning

@slashML

Oct 28

OpenAI successfully converts from non-profit to un-profitable for-profit.

OpenAI

@OpenAI

Oct 28

We completed our recapitalization. The non-profit, the OpenAI Foundation, is now one of the best resourced philanthropies ever, with equity valued at ~$130B. It continues to control the OpenAI for-profit, which is now a public benefit corporation. openai.com/index/built-to…

OpenAI's tweet card. By Bret Taylor, Chair of the OpenAI Board of Directors

Built to benefit everyone

Source: openai.com

/MachineLearning reposted

Haider.

@slow_developer

Sep 2

Geoffrey Hinton says I'm more optimistic now, not because we'll control AI, but because we might not need to "don't try to dominate superintelligence; design it to care, like a mother wired to protect her child" Control through attachment, not power. we want AI to be like that

/MachineLearning reposted

echo.hive

@hive_echo

Oct 26

Spiking Neural Network from scratch achieves 8% accuracy. no backpropagation or SGD I created a genetic hyper parameter optimizer and it now, on average, can get 8% accuracy which is ~3% above chance Link to source code with a detailed video and markdown explanations in comment…

echo.hive

@hive_echo

Oct 24

I built a biologically inspired spiking neural network from scratch and it learned with %5 accuracy to do addition :) There is no backpropagation, no artificial loss functions - just spikes, synapses, and dopamine-like reward signals. it uses STDP -> "Spike-Timing-Dependent…

/MachineLearning reposted

Julian Ibarz

@julianibarz

Oct 25

I disagree with @ylecun on this. We have a pretty good idea at Tesla on how we can make general humanoids a reality very quickly. Funny anecdote: Yann was advising me to launch what became the first production vision based deep neural network at Google. His feedback: use convs,…

Humanoids daily

@humanoidsdaily

Oct 24

Meta's Chief AI Scientist Yann LeCun offers a critical take on the humanoid robot boom. Speaking at MIT, LeCun claimed the "big secret" of the industry is that current companies "have no idea" how to make their robots "smart enough to be generally useful." He argues that while…

/MachineLearning reposted

Andrej Karpathy

@karpathy

Oct 21

Haha. I am afraid people interpreted my “delete tokenizer” as “use bytes directly without BPE”, the issue is you *still* need bytes encoding arbitrariness even for that! Pixels is the only way. Just like humans. It is written. If GPT-10 uses utf8 at the input I will eat a shoe.

/MachineLearning reposted

Sakana AI

@SakanaAILabs

Oct 23

Sakana AI’s CTO says he’s ‘absolutely sick’ of transformers, the tech that powers every major AI model “You should only do the research that wouldn’t happen if you weren’t doing it.” (@thisismyhat) 🧠 @YesThisIsLion venturebeat.com/ai/sakana-ais-…

SakanaAILabs's tweet card. Llion Jones, co-creator of the transformer technology powering ChatGPT, warns AI research has become too narrow and says he's moving on from his own invention.

Sakana AI's CTO says he's 'absolutely sick' of transformers, the tech that powers every major AI...

Source: venturebeat.com

/MachineLearning reposted

Yuchen Jin

@Yuchenj_UW

Oct 23

Meta laid off 600 people from its Superintelligence Lab today. Many FAIR researchers, including FAIR Research Scientist Director Yuandong Tian, were affected. I think Yann Lecun will leave soon. Maybe I should raise $2B and start a new frontier lab with these folks.

/MachineLearning reposted

Greg Kamradt

@GregKamradt

Oct 22

ARC Prize announces all validated scores on ARC-AGI We have not verified MythWorx's 100% claim in their recent fundraise $100M val press release We would be open to verifying their score (assuming it passes the testing policy) for the founder and their investors

GregKamradt's tweet image. ARC Prize announces all validated scores on ARC-AGI

We have not verified MythWorx's 100% claim in their recent fundraise $100M val press release

We would be open to verifying their score (assuming it passes the testing policy) for the founder and their investors

/MachineLearning reposted

Qwen

@Alibaba_Qwen

Oct 14

Introducing the compact, dense versions of Qwen3-VL — now available in 4B and 8B pairs, each with both Instruct and Thinking variants. ✅ Lower VRAM usage ✅ Full Qwen3-VL capabilities retained ✅ Strong performance across the board Despite their size, they outperform models…

Alibaba_Qwen's tweet image. Introducing the compact, dense versions of Qwen3-VL — now available in 4B and 8B pairs, each with both Instruct and Thinking variants.

✅ Lower VRAM usage
✅ Full Qwen3-VL capabilities retained
✅ Strong performance across the board

Despite their size, they outperform models…

/MachineLearning reposted

Reflection AI

@reflection_ai

Oct 9

Today we're sharing the next phase of Reflection. We're building frontier open intelligence accessible to all. We've assembled an extraordinary AI team, built a frontier LLM training stack, and raised $2 billion. Why Open Intelligence Matters Technological and scientific…