
↑ Michael Bukatin ↩🇺🇦
@ComputingByArts
Dataflow matrix machines (neuromorphic computations with linear streams). Julia, Python, Clojure, C, Processing. Shaders, ambient, psytrance, 40hz sound.
قد يعجبك
New paper! We reverse engineered the mechanisms underlying Claude Haiku’s ability to perform a simple “perceptual” task. We discover beautiful feature families and manifolds, clean geometric transformations, and distributed attention algorithms!

There is now a review by Grigory Sapunov: gonzoml.substack.com/p/tiny-recursi…
New paper 📜: Tiny Recursion Model (TRM) is a recursive reasoning approach with a tiny 7M parameters neural network that obtains 45% on ARC-AGI-1 and 8% on ARC-AGI-2, beating most LLMs. Blog: alexiajm.github.io/2025/09/29/tin… Code: github.com/SamsungSAILMon… Paper: arxiv.org/abs/2510.04871
New paper 📜: Tiny Recursion Model (TRM) is a recursive reasoning approach with a tiny 7M parameters neural network that obtains 45% on ARC-AGI-1 and 8% on ARC-AGI-2, beating most LLMs. Blog: alexiajm.github.io/2025/09/29/tin… Code: github.com/SamsungSAILMon… Paper: arxiv.org/abs/2510.04871
arxiv.org/abs/2509.21049, Physics of Learning: A Lagrangian perspective to different learning paradigms "We study the problem of building an efficient learning system. Efficient learning processes information in the least time, i.e., building a system that reaches a desired…
The paper claims learning (an AI system learning or machine learning in general) follows a physics style least action rule that unifies supervised, generative, and reinforcement learning. Shows that supervised learning, generative modeling, and reinforcement learning can all be…

"LoRA Without Regret" by @johnschulman2 et al. The most interesting finding is that one should not fine-tune attention layers, one should only fine-tune MLP layers in most situations.
LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.…

HOW INFORMATION FLOWS THROUGH TRANSFORMERS Because I've looked at those "transformers explained" pages and they really suck at explaining. There are two distinct information highways in the transformer architecture: - The residual stream (black arrows): Flows vertically through…



KV caching overcomes statelessness in a very meaningful sense and provides a very nice mechanism for introspection (specifically of computations at earlier token positions) the Value representations can encode information from residual streams of past positions without…
Asked Sonnet-4.5 to perform a refactor. Left it working alone. 5 minutes later, it declared victory. I committed it and started testing. Something was wrong. Many tests broke. I pointed out the issue, and asked it to investigate. It worked for more 3 minutes, found and fixed a…

I really like Claude 4.5 for coding, it is fast, reliable, surgical, high-quality in a good way. I think I will use it a lot, specially for style refactors and things like that. But it is nowhere near as smart as GPT-5. I wouldn't leave it alone making large changes on HVM. Yes,…
We've entered a new phase where progress in chatbots is starting to top out but progress in automating AI research is steadily improving. It's a mistake the confuse the two.
So, with this recent trend of doubling per 4 months, and with internal model capabilities being ~6 months ahead of public releases, the internal systems at OpenAI are probably able to take jobs which take a human a whole day. One can get plenty of AI research out of that...
Has AI progress slowed down? I’ll write some personal takes and predictions in this thread. The main metric I look at is METR’s time horizon, which measures the length of tasks agents can perform. It has been doubling for more than 6 years now, and might have sped up recently.

A really nice benchmark:
Recently, there has been a lot of talk of LLM agents automating ML research itself. If Llama 5 can create Llama 6, then surely the singularity is just around the corner. How can we get a pulse check on whether current LLMs are capable of driving this kind of total…

on grok.com, the backend sends the full (not summarized) CoT to your browser. it's not displayed in the UI, but you can see it with browser dev tools or w/e check out the json payload of responses from `grok.com/rest/app-chat/…{conversation_id}/load-responses`
chain-of-thought monitorability is a wonderful thing ;) gist.githubusercontent.com/nostalgebraist…
Kimi K2 tech report just dropped! Quick hits: - MuonClip optimizer: stable + token-efficient pretraining at trillion-parameter scale - 20K+ tools, real & simulated: unlocking scalable agentic data - Joint RL with verifiable + self-critique rubric rewards: alignment that adapts -…




The very first task I usually give new pretraining people is to run a tiny transformer, profile it, and understand it deeply. I wrote up a small tutorial covering this exact workflow. I talk about how to measure GPU perf, how to estimate tensor core speedup, etc. Take a look:
This paper is interesting from the perspective of metascience, because it's a serious attempt to empirically study why LLMs behave in certain ways and differently from each other. A serious attempt attacks all exposed surfaces from all angles instead of being attached to some…
New Anthropic research: Why do some language models fake alignment while others don't? Last year, we found a situation where Claude 3 Opus fakes alignment. Now, we’ve done the same analysis for 25 frontier LLMs—and the story looks more complex.

A very interesting recent work on distributed Muon (Dion): share.google/lfZ46PQPSXmRIC…
United States الاتجاهات
- 1. Tony Vitello 8,053 posts
- 2. Elander 1,531 posts
- 3. $NOICE N/A
- 4. #Married2Med N/A
- 5. #LoveIsBlindS9 2,599 posts
- 6. Danny White 1,467 posts
- 7. San Francisco Giants 3,415 posts
- 8. SNAP 645K posts
- 9. Surviving Mormonism N/A
- 10. #13YearsOfRed 1,000 posts
- 11. Jay Johnson N/A
- 12. #SFGiants N/A
- 13. NextNRG Inc $NXXT N/A
- 14. #StunNING_23rd_BDAY 42.5K posts
- 15. Brahim 3,951 posts
- 16. Daniel Suarez N/A
- 17. Buster Posey N/A
- 18. $BYND 141K posts
- 19. NINGNING IS THE MAKNAE DAY 37K posts
- 20. Cattle Ranchers 15.9K posts
قد يعجبك
-
Matthew Siper
@MatthewSiper -
Dhruv Shah
@shahdhruv_ -
AIT Lab
@ait_eth -
Richard Song
@XingyouSong -
Eris (Discordia, הרס, Sylvie, Lilith, blahblah, 🙄
@oren_ai -
Ron Mokady
@MokadyRon -
Vikash Sehwag
@VSehwag_ -
Jennifer J. Sun
@JenJSun -
Gabriel Barth-Maron
@gbarthmaron -
Joseph Viviano
@josephdviviano -
Alex Groznykh
@algroznykh -
Tsun-Yi Yang 楊存毅 🇹🇼🏳️🌈
@shamangary -
~D
@davidrd123 -
synthiola 🎀
@FabianMosele -
Drew Jaegle
@drew_jaegle
Something went wrong.
Something went wrong.