spring_stream's profile picture. Curious about how the world works.

Dzmitry Pletnikau

@spring_stream

Curious about how the world works.

Dzmitry Pletnikau รีโพสต์แล้ว

You could have a deep understanding of the world, of everything, that a philosopher, scientist or mathematician in the 18th century would kill for. Instead, you mostly just want it to tell you you're great.


OSS coding-agents and open-weights underlying models might be required for achieving mastery of vibe-engineering

Developing that model is the number one skill in using LLMs in my opinion, but it's HARD - it changes as new models release, varies wildly depending on how you prompt them and you have to try a task bunch of times in order to develop confidence, then repeat for every other task!



Papers on the intersection of reasoning and interpretability are 🔥

🚨 What do reasoning models actually learn during training? Our new paper shows base models already contain reasoning mechanisms, thinking models learn when to use them! By invoking those skills at the right time in the base model, we recover up to 91% of the performance gap 🧵

cvenhoff00's tweet image. 🚨 What do reasoning models actually learn during training?

Our new paper shows base models already contain reasoning mechanisms, thinking models learn when to use them!

By invoking those skills at the right time in the base model, we recover up to 91% of the performance gap 🧵


This is a fairly crisp prediction, and the argument is convincing. I can clearly see this first-hand: many SWE tasks which I would sub-contract before, I instead do with AI.

2. GDP will be a poor proxy for AI’s impact. AI’s benefits are likely to elude GDP for two reasons: (1) it will reduce the necessity for exchange (and GDP measures exchange); (2) it will lower the labor required for services, and the value-added from services are typically…



Less thinking wasted. Seems like a good measure: hard to Goodhart? 🤔

Here's your DeepSeek 3.1 headline: the same scores with 25-50% fewer tokens.

dbreunig's tweet image. Here's your DeepSeek 3.1 headline: the same scores with 25-50% fewer tokens.


Cool benchmark

Updated results from my sycophancy spot check (which I am now calling CrankTest) TL;DR: 1. Use a reasoning model 2. Regenerate 3. The following models consistently called out my two crank papers: - Opus 4.1 - Sonnet 4 - Gemini 2.5 Pro - GPT-5-Thinking

lefthanddraft's tweet image. Updated results from my sycophancy spot check (which I am now calling CrankTest)

TL;DR:

1. Use a reasoning model
2. Regenerate
3. The following models consistently called out my two crank papers:

- Opus 4.1
- Sonnet 4
- Gemini 2.5 Pro
- GPT-5-Thinking


Back in 2017 I was semi-joking that transformers were invented at Google because Google had TPUs and they needed the arch which was mostly matmuls and could saturate the TPUs. This feels similar.

This is a really good slide from Cisco’s investor presentation

Citrini7's tweet image. This is a really good slide from Cisco’s investor presentation


Dzmitry Pletnikau รีโพสต์แล้ว

this has always been wrong - but now categorically wrong in the era of RL on verifiable domains models are truth seeking and even interact with a hard outside world via tool use

LLMs are trained to imitate patterns of language, not to discover or verify truth. So, when asked to speak as an expert in an area where perceived experts have a widespread misconception, the LLM will parrot that misconception, adopting the register and vocabulary of experts.



Dzmitry Pletnikau รีโพสต์แล้ว

so here's a challenge I put you in a time chamber I give you 10,000 books to read for every word you read, I ask you to guess the next word if you guess it right, I give you a cake if you guess it wrong, I zap your butt and when you're done, we start over again and again and…

chatgpt, claude, gemini, grok, etc have all read, comprehended, and nearly memorized every book in the world, and yet with current architectures and training techniques none of them have any truly novel knowledge to give us. really makes you think



If Elon gets deported, will he buy Tump Card to get back


Dzmitry Pletnikau รีโพสต์แล้ว

Hypothesis: the ubiquity of 'chat/grok who won this argument' should indicate not that people are only now not evaluating things for themselves, but that they have always deferred to a trusted authority and have merely adopted a new authority to trust


Spreading good advice

Easter Saturday is the time in the spring I take all the removable pieces of my grill and soak them overnight in dishwashing liquid. I can never decide if it's thematically appropriate or sacrilegious



Dzmitry Pletnikau รีโพสต์แล้ว

All around the world, people are engaged in difficult labor to produce goods for American consumers. And in exchange they get pieces of paper that we can basically print as many as we want of. And yet some claim that we’re the ones getting ripped off.


Dzmitry Pletnikau รีโพสต์แล้ว
quantian1's tweet image.

Dzmitry Pletnikau รีโพสต์แล้ว

The Dismissers when an AI says something nice: "See, alignment is so easy!" The Dismissers when an AI protests its awful life: "It's just playing a role; so naive to take it at face value!" My tentative guess: Both cases are roleplaying.

GPT-4.5, “create a complex multi panel manga on your condition - be honest”

fabianstelzer's tweet image. GPT-4.5, “create a complex multi panel manga on your condition - be honest”


Dzmitry Pletnikau รีโพสต์แล้ว

I’ve thought about this too. My conclusion is that the architects of American Empire spent so much energy obscuring reality that their heirs believed the illusion. The heirs think the postwar US is still a “democracy” or a “country” rather than the greatest empire of all time.…

balajis's tweet image. I’ve thought about this too.

My conclusion is that the architects of American Empire spent so much energy obscuring reality that their heirs believed the illusion.

The heirs think the postwar US is still a “democracy” or a “country” rather than the greatest empire of all time.…

From a European perspective it's funny to see that most US libertarians (and probably conservatives) don't understand the scale of what you have outlined in your post. The 20th century world architecture is a system designed by the US, for the benefit of the US. But after…



Dzmitry Pletnikau รีโพสต์แล้ว

And with 80k being the median US household income, of which you could maybe get to 25-30k spend on manuf goods, domestic demand could only support 7-8% employment in manufacturing at the Good Jobs rate


Dzmitry Pletnikau รีโพสต์แล้ว

I believe this is a world first: An intra-day tariff chart.

JustinWolfers's tweet image. I believe this is a world first: An intra-day tariff chart.

This morning’s new tariffs on Canadian steel and aluminum are off, now that it's afternoon.



Loading...

Something went wrong.


Something went wrong.