program_counter's profile picture. all things toward agi

Program Counter

@program_counter

all things toward agi

Program Counter 已轉發

Interestingly, it was still hard to tell when AI models gain better reasoning – during pre-training, mid-training, or RL. Researchers at @CarnegieMellon found that each of them plays distinct roles: - RL truly improves reasoning only in specific conditions - Generalizing across…

TheTuringPost's tweet image. Interestingly, it was still hard to tell when AI models gain better reasoning – during pre-training, mid-training, or RL.

Researchers at @CarnegieMellon found that each of them plays distinct roles:

- RL truly improves reasoning only in specific conditions
- Generalizing across…

Program Counter 已轉發

My machine learning education has progressed to the point where I lose sleep tossing around a “brilliant” idea, only to find the next day that it doesn’t actually work. This is great! I talked about this a few years ago: amasad.me/carmack


Program Counter 已轉發

Transformers v5 redesigns tokenization. In this blog post we talk about: > tokenization crash course > tokenizers and transformers - the bridge > v5 tokenizers backend Major shoutout to @karpathy for his BPE video that got me interested in tokenization in the first place.

itazapo's tweet image. Transformers v5 redesigns tokenization.

In this blog post we talk about:
> tokenization crash course
> tokenizers and transformers - the bridge
> v5 tokenizers backend

Major shoutout to @karpathy for his BPE video that got me interested in tokenization in the first place.

Program Counter 已轉發

Excited to announce Seed-Prover 1.5 which is trained via large-scale agentic RL with Lean. It proved 580/660 Putnam problems and proved 11/12 in Putnam 2025 within 9 hours. Check details at github.com/ByteDance-Seed…. We will work on autoformalize towards contributing to real math!

GanjinZero's tweet image. Excited to announce Seed-Prover 1.5 which is trained via large-scale agentic RL with Lean. It proved 580/660 Putnam problems and proved 11/12 in Putnam 2025 within 9 hours. Check details at github.com/ByteDance-Seed…. We will work on autoformalize towards contributing to real math!

Program Counter 已轉發

Excited to share that our paper 🌊🤺 “CFC: Simulating Character–Fluid Coupling using a Two-Level World Model” has been accepted to #SIGGRAPHASIA2025! In this work, we build a two-level world model (neural physics) for rigid-body–fluid interaction and use it to train…


Program Counter 已轉發

Weak AVL trees are replacements for AVL trees and red-black trees. The insertion and deletion operations are inspired by the FreeBSD implementation (sys/tree.h), with the insertion further optimized. maskray.me/blog/2025-12-1…


Program Counter 已轉發

Really grateful to @GPU_MODE for the opportunity to talk about my recent Tiny TPU project: 🧵youtube.com/watch?v=kccs9x….

will_z65038's tweet card. Lecture 88: TinyTPU

youtube.com

YouTube

Lecture 88: TinyTPU


Program Counter 已轉發

Great new video! Should we make a question based on the titans paper?

real_deep_ml's tweet image. Great new video! Should we make a question based on the titans paper?

Program Counter 已轉發

New NanoGPT WR from @ChrisJMcCormick at 130.2s, a 1.4s improvement! He has somehow found a way to make Muon even faster, along with several other optimizations to pre-multiply lambdas, update Normuon axis on gates, and reshape matrices. github.com/KellerJordan/m….

classiclarryd's tweet image. New NanoGPT WR from @ChrisJMcCormick at 130.2s, a 1.4s improvement! He has somehow found a way to make Muon even faster, along with several other optimizations to pre-multiply lambdas, update Normuon axis on gates, and reshape matrices. github.com/KellerJordan/m….
classiclarryd's tweet image. New NanoGPT WR from @ChrisJMcCormick at 130.2s, a 1.4s improvement! He has somehow found a way to make Muon even faster, along with several other optimizations to pre-multiply lambdas, update Normuon axis on gates, and reshape matrices. github.com/KellerJordan/m….

Program Counter 已轉發

My old causal inference reading lists Intro to Causal Inference (undergrad stats, Yale, 2021): stat.berkeley.edu/~winston/causa… Causal Inference & Research Design (grad political science seminar, Yale, 2019): stat.berkeley.edu/~winston/causa…


Program Counter 已轉發

Check out the "Learning JAX" video series if you're interested in learning JAX!

smn_sdt's tweet image. Check out the "Learning JAX" video series if you're interested in learning JAX!

Program Counter 已轉發

I made it into Terry Tao’s blog! terrytao.wordpress.com/2025/12/08/the… One cool part of this experience is that I *would not have made the Claude Deep Research query resulting in the connection to Erdos 106 if not for Aristotle’s exact implementation*. i.e. Aristotle, an AI tool, contributed…


Program Counter 已轉發

An interesting research problem that might be solvable at big labs / @thinkymachines / @wandb With enough data on training runs, can we make universal recommendations of good hyperparams, using stats of dataset, lossfn, activations, size etc Would save so much time, compute


Program Counter 已轉發

A Recipe for Transformer+++: GQA/TPA (arxiv.org/abs/2501.06425) + QKRMSNorm + Output Gate (arxiv.org/abs/2505.06708) + GRAPE/Alibi (github.com/model-architec…) + KV Shifting (Shortconv/canon layer) Enjoy it!


Program Counter 已轉發

If the #NeurIPS2025 app is crashing for you (like it’s for me) to the point that’s unusable, here a website with all the content/sessions: ml.ink


Program Counter 已轉發

I made another NeurIPS 2025 hiring list. More teams hiring research engineers, MLEs, SWEs: @julianibarz at Tesla Optimus @nlpmattg at @ScaledCognition @msalbergo at Kempner Institute at Harvard @chinwei_h at MSFT Research @apsarathchandar at Chandar Lab @stash_pomichter at…


Program Counter 已轉發

my first blogpost related to GPUs! this one looks at pyutils, a small but important part of the ThunderKittens library that allows kernels to be launched with PyTorch. enbao.me/posts/tk


Program Counter 已轉發

Even large VLAs can play ping-pong in real time! 🏓⚡️ In practice, VLAs struggle with fast, dynamic tasks: • slow reactions, jittery actions. • demos often shown at 5-10× speed to look “smooth”. We introduce VLASH: • future-state-aware asynchronous inference with >30Hz…


Program Counter 已轉發

Imagine the world if hardware folks settle on MXINT4 instead

Training LLMs with NVFP4 is hard because FP4 has so few values that I can fit them all in this post: ±{0, 0.5, 1, 1.5, 2, 3, 4, 6}. But what if I told you that reducing this range even further could actually unlock better training + quantization performance? Introducing Four…

jackcookjack's tweet image. Training LLMs with NVFP4 is hard because FP4 has so few values that I can fit them all in this post: ±{0, 0.5, 1, 1.5, 2, 3, 4, 6}. But what if I told you that reducing this range even further could actually unlock better training + quantization performance?

Introducing Four…


Program Counter 已轉發

Excited to announce that @Azaliamirh and I are launching @RicursiveAI, a frontier AI lab creating a recursive self-improving loop between AI and the hardware that fuels it. Today, chip design takes 2-3 years and requires thousands of human experts. We will reduce that to weeks.…

Introducing Ricursive Intelligence, a frontier AI lab enabling a recursive self-improvement loop between AI and the chips that fuel it. Learn more at ricursive.com



Loading...

Something went wrong.


Something went wrong.