program_counter's profile picture. all things toward agi

Program Counter

@program_counter

all things toward agi

Program Counter reposted

TPU question: suppose I want to do a point wise operation on a buffer that is 90% padding, but the padding boundary is only known on device. How do I avoid wasting compute cycles for the padded regions?


Program Counter reposted

I ran gpt-oss over #NeurIPS2025 papers and tagged the mech interp / AI safety-ish ones. In case it's useful: docs.google.com/spreadsheets/d…


Program Counter reposted

Headed to #NeurIPS with the Reflection team this week! 👋 Keen to chat about LLMs, RL, agents, open research & science. We have a few open roles: jobs.ashbyhq.com/reflectionai


Program Counter reposted

im excited to be at NeurIPS in San Diego this week!! shoot me a DM if ur here or wanna hang! :D

rayhascode's tweet image. im excited to be at NeurIPS in San Diego this week!!

shoot me a DM if ur here or wanna hang! :D

Program Counter reposted

Together with @yuxiangw_cs and Maryam Fazel, we are excited to present our tutorial "Theoretical Insights on Training Instability in Deep Learning" tomorrow at #NeurIPS2025! Link: uuujf.github.io/instability/ *picture generated by Gemini

uuujingfeng's tweet image. Together with @yuxiangw_cs and Maryam Fazel, we are excited to present our tutorial "Theoretical Insights on Training Instability in Deep Learning" tomorrow at #NeurIPS2025!
Link: uuujf.github.io/instability/
*picture generated by Gemini

Program Counter reposted

Very interesting optimizations to push the limit of GPUs. The team will test and try to integrate it.

Releasing Alpha-MoE: Megakernel for fast Tensor Parallel Inference! Up to 200% faster execution of MoE layer in SGLang, with 17% higher average throughput on Qwen3-Next-80B, and 10% higher average throughput on DeepSeek Proud to showcase my recent work at @Aleph__Alpha🧵

SzymonOzog_'s tweet image. Releasing Alpha-MoE: Megakernel for fast Tensor Parallel Inference!

Up to 200% faster execution of MoE layer in SGLang, with 17% higher average throughput on Qwen3-Next-80B, and 10% higher average throughput on DeepSeek 

Proud to showcase my recent work at @Aleph__Alpha🧵


Program Counter reposted

For those interested in the intersection of Geometry and machine learning: Charles Fefferman (Fields Medalist) recently gave a great online talk at Harvard CMSA on "extrinsic and intrinsic manifold learning, old and new". Very interesting talk. Link: youtube.com/watch?v=XUv6re…

Saman_Habibi_E's tweet image. For those interested in the intersection of Geometry and machine learning: Charles Fefferman (Fields Medalist) recently gave a great online talk at Harvard CMSA on "extrinsic and intrinsic manifold learning, old and new". Very interesting talk. Link: youtube.com/watch?v=XUv6re…

Program Counter reposted

I will join Tsinghua University, College of AI, as an Assistant Professor in the coming month. I am actively looking for 2026 spring interns and future PhDs (ping me if you are in #NeurIPS). It has been an incredible journey of 10 years since I attended an activity organized by…

xyz2maureen's tweet image. I will join Tsinghua University, College of AI, as an Assistant Professor in the coming month. I am actively looking for 2026 spring interns and future PhDs (ping me if you are in #NeurIPS).

It has been an incredible journey of 10 years since I attended an activity organized by…

Program Counter reposted

This video by @jbhuang0604 manages to cram in all the core pieces of modern attention variants, a perfect refresher if you (like me) need a reminder of the differences between MHA, GQA, MLA, DSA etc :) youtube.com/watch?v=Y-o545…

johnowhitaker's tweet image. This video by @jbhuang0604 manages to cram in all the core pieces of modern attention variants, a perfect refresher if you (like me) need a reminder of the differences between MHA, GQA, MLA, DSA etc :)
youtube.com/watch?v=Y-o545…

Program Counter reposted

We’ve opened some more full-time & intern roles at Google DeepMind. Come work with us! DM if interested, or come find me at NeurIPS this week! ☀️


Program Counter reposted

Coming to Neurips from 5-7th! Speaking at the mechanistic interpretability workshop mechinterpworkshop.com on the 7th about unmechanistic interpretability (as requested by the organizers) 🙃🙂 While I’ll miss this, our work will be demoed at Google booth eg veo zeroshot…


Program Counter reposted

I'll be at NeurIPS 2025 in San Diego. I will be presenting our poster on Continuous Thought Machines (pub.sakana.ai/ctm) on Thursday afternoon: neurips.cc/virtual/2025/l… See you there?


Program Counter reposted

I just came across this really good summary of @YesThisIsLion and my @MLStreetTalk interview (youtu.be/DtePicx_kFY): theneuron.ai/explainer-arti… Noice.


Program Counter reposted

Super excited to share that I have joined the Applied AI, US team at @MistralAI. Grateful to @aviTwit3 and @sophiamyang for the opportunity, and thankful to the HR team, Charles, Alexandre, and Brian for making the transition seamless. A heartfelt thank you to @NirantK, Naveen…

ravithejads's tweet image. Super excited to share that I have joined the Applied AI, US team at @MistralAI. Grateful to @aviTwit3 and @sophiamyang for the opportunity, and thankful to the HR team, Charles, Alexandre, and Brian for making the transition seamless.

A heartfelt thank you to @NirantK, Naveen…

Program Counter reposted

Unlike current AI systems, brains can quickly & flexibly adapt to changing environments. This is the topic of our perspective in Nature MI (rdcu.be/eSeif), where we relate dynamical & plasticity mechanisms in the brain to in-context & continual learning in AI. #NeuroAI

DurstewitzLab's tweet image. Unlike current AI systems, brains can quickly & flexibly adapt to changing environments.
This is the topic of our perspective in Nature MI (rdcu.be/eSeif), where we relate dynamical & plasticity mechanisms in the brain to in-context & continual learning in AI. #NeuroAI

Program Counter reposted

Lumine - 64 H100 are all you need. 2026 is going to be a crazy year for robotics.

I'm not sure why this new ByteDance Seed paper is not all over my feed. Am I missing something? - trained Qwen2VL-7B to play genshin - SFT only, no RL - 2424 hours of human gameplay + 15k short reasoning traces to decompose the tasks - sub 20k H100 hours (3 epochs) - heaps of…

stalkermustang's tweet image. I'm not sure why this new ByteDance Seed paper is not all over my feed. Am I missing something?

- trained Qwen2VL-7B to play genshin
- SFT only, no RL
- 2424 hours of human gameplay + 15k short reasoning traces to decompose the tasks
- sub 20k H100 hours (3 epochs)
- heaps of…
stalkermustang's tweet image. I'm not sure why this new ByteDance Seed paper is not all over my feed. Am I missing something?

- trained Qwen2VL-7B to play genshin
- SFT only, no RL
- 2424 hours of human gameplay + 15k short reasoning traces to decompose the tasks
- sub 20k H100 hours (3 epochs)
- heaps of…
stalkermustang's tweet image. I'm not sure why this new ByteDance Seed paper is not all over my feed. Am I missing something?

- trained Qwen2VL-7B to play genshin
- SFT only, no RL
- 2424 hours of human gameplay + 15k short reasoning traces to decompose the tasks
- sub 20k H100 hours (3 epochs)
- heaps of…
stalkermustang's tweet image. I'm not sure why this new ByteDance Seed paper is not all over my feed. Am I missing something?

- trained Qwen2VL-7B to play genshin
- SFT only, no RL
- 2424 hours of human gameplay + 15k short reasoning traces to decompose the tasks
- sub 20k H100 hours (3 epochs)
- heaps of…


Program Counter reposted

The Strategic Explorations team @OpenAI is looking to recruit researchers interested in working on the next frontier of language modeling! Feel free to reach out to me by email. @darkproger and I will also be at NeurIPS to connect and discuss in person.


Program Counter reposted

train 8 neuron NN, expand it to 256 neuron NN and continue training from same accuracy and even gain a jump 1) we copy the small network’s 8 hidden units into the larger layer and duplicate the rest so the larger model computes exactly the same function as the small one at the…

Evo strategy and Hebbian-learning work together to train a network to 80% accuracy! NO backprop The script alternates between these two learning paradigms, allowing ES to explore the weight space broadly while Hebbian learning fine-tunes the solutions. -Code in comment- trained…



Program Counter reposted

After 3 weeks, we have concluded our first problem of the @GPU_MODE x @nvidia competition, NVFP4 GEMV. Thanks to everyone who has participated, we have collected over 40k submissions from >200 users. Congrats to the winners and good luck with the next problem, NVFP4 GEMM 🔥

m_sirovatka's tweet image. After 3 weeks, we have concluded our first problem of the @GPU_MODE x @nvidia competition, NVFP4 GEMV. Thanks to everyone who has participated, we have collected over 40k submissions from >200 users. Congrats to the winners and good luck with the next problem, NVFP4 GEMM 🔥

Loading...

Something went wrong.


Something went wrong.