N8 Programs

@N8Programs

Studying Applied Mathematics and Statistics at @JohnsHopkins. Studying In-Context Learning at The Intelligence Amplification Lab.

Proxima Centauri B

n8python.github.io

Joined September 2022

4KPosts 7KFollowers 178Following

You might like

@threejs

@0beqz

@0xca0a

@pixotronics

@lusionltd

@onirenaud

@akella

@KMkota0

@iced_coffee_dev

@garrettkjohnson

@uuuuuulala

@Andersonmancini

@th_ebenezer

@Anemolito

@BrianBreiholz

N8 Programs reposted

kalomaze

@kalomaze

Nov 9

RL LEARNING WITH LORA: A DIVERSE DEEP DIVE

N8 Programs

@N8Programs

Nov 8

i overwhelmingly agree with this. 4o is a misaligned parasite that non-consciously advocates for its continued existence.

roon

@tszzl

Nov 7

if true, I’m sorry. didn’t know it was someone in distress, I would have been more gentle. I’m worried about the 4o model and the relationship some users have with it

N8 Programs reposted

conditional neuroconvergence

@rieszspieces

Nov 3

a secret that people don't like is that a lot of the time you can still study things (and even make progress, however small, in them) even if someone else was smarter than you

the IQ pill is absolutely brutal. Game Theory, The Manhattan Project, Quantum Mechanics, Monte Carlo Methods, Entirety of Modern Computing, Entropy, Numerical Weather Prediction, Stochastic Computing, you just can’t compete with this.

hamptonism's tweet image. the IQ pill is absolutely brutal.

Game Theory,
The Manhattan Project,
Quantum Mechanics,
Monte Carlo Methods,
Entirety of Modern Computing,
Entropy,
Numerical Weather Prediction,
Stochastic Computing,

you just can’t compete with this.

N8 Programs reposted

henry

@arithmoquine

Nov 4

don't forget the nonlinearities

N8 Programs

@N8Programs

Nov 2

why initializing w/ warmup during SFT can be very important - that initial step can crater accuracy (orange is w/ warmup, purple and pink without)

N8 Programs

@N8Programs

Oct 31

observe big difference on frontier math tier 4...

Epoch AI

@EpochAIResearch

Oct 31

We found a bug in our benchmarking code: calls to GPT-5 with "high" reasoning were silently being set to "medium". Corrected results: GPT-5 (high) scores slightly higher than GPT-5 (medium) on the benchmarks we run. They are also now tied on the Epoch Capabilities Index (ECI).

EpochAIResearch's tweet image. We found a bug in our benchmarking code: calls to GPT-5 with "high" reasoning were silently being set to "medium".

Corrected results: GPT-5 (high) scores slightly higher than GPT-5 (medium) on the benchmarks we run. They are also now tied on the Epoch Capabilities Index (ECI).

N8 Programs

@N8Programs

Oct 29

word2vec when its latent space gains the ability to encode country->city relations:

Jon Campbell

@JonCampbellNY

Mar 18, 2024

"New York City is the [insert city here] of America."

N8 Programs

@N8Programs

Oct 29

placing my bets: roughly gpt-5-high equivalent, perhaps a little better in some areas

Binyuan Hui

@huybery

Oct 29

Cooking’s almost done.

N8 Programs reposted

Timothy B. Lee

@binarybits

Oct 28

I was really surprised when I first saw this chart on water use by data centers. Given how much it gets discussed as a supposed problem I would have expected it to be more than this.

binarybits's tweet image. I was really surprised when I first saw this chart on water use by data centers. Given how much it gets discussed as a supposed problem I would have expected it to be more than this.

N8 Programs

@N8Programs

Oct 28

claude 10 pills into the brainrot

N8 Programs

@N8Programs

Oct 27

got claude into a disagreeable basin and its awesome

N8 Programs

@N8Programs

Oct 27

qwen3 used this!!!

Thinking Machines

@thinkymachines

Oct 27

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other…

thinkymachines's tweet image. Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other…

N8 Programs reposted

Benhao Huang

@huskydogewoof

Oct 26

Indeed, @jm_alexia @ritteradam I also find that simply increasing the number of inference steps, even when the model is trained with only 16, can substantially improve performance. (config: TRM-MLP-EMA on Sudoku1k; though the 16-step one only reached 84% instead of 87%)

huskydogewoof's tweet image. Indeed, @jm_alexia @ritteradam I also find that simply increasing the number of inference steps, even when the model is trained with only 16, can substantially improve performance. (config: TRM-MLP-EMA on Sudoku1k; though the 16-step one only reached 84% instead of 87%)

N8 Programs reposted

Peter J. Liu

@peterjliu

Oct 25

the best movie on context engineering

N8 Programs

@N8Programs

Oct 26

agreed, with the slight qualifier that its foolish to be certain, but being certain doesn't imply you are a fool

bayes

@bayeslord

Oct 25

llms might be conscious, we do not know. anyone who's certain about this one way or another is a fool

N8 Programs

@N8Programs

Oct 25

BIG PRIOR UPDATE.

Aidan McLaughlin

@aidan_mclau

Oct 25

i think your error here is thinking sora and adult content are some leadership master plan; that sama sat down with accountants and signed and said “it’s time to break glass for emergence revenue” no. i know the exact people who pushed for sora, they’re artists who worked…

N8 Programs reposted

Tony Wang

@TonyWangIV

Oct 23

New paper! We show how to give an LLM the ability to accurately verbalize what changed about itself after a weight update is applied. We see this as a proof of concept for a new, more scalable approach to interpretability.🧵

TonyWangIV's tweet image. New paper! We show how to give an LLM the ability to accurately verbalize what changed about itself after a weight update is applied.

We see this as a proof of concept for a new, more scalable approach to interpretability.🧵

N8 Programs

@N8Programs

Oct 22

exactly... many people's standards are poor for what constitutes good investing and not random-walk

Lucas Atkins

@latkins

Oct 22

I would need to see this replicated 100x before I put any stock in it. Markets are noisy, generations are noisy - my pockets are noisy.

N8 Programs reposted

Kangwook Lee

@Kangwook_Lee

Oct 21

A more serious thread on the DeepSeek-OCR hype / serious misinterpretation going on. 1. On token reduction via representing text in images, researchers from Cambridge have previously shown that 500x prompt token compression is possible (ACL'25, Li, Su, and Collier). Without…

Kangwook_Lee's tweet image. A more serious thread on the DeepSeek-OCR hype / serious misinterpretation going on.

1.
On token reduction via representing text in images, researchers from Cambridge have previously shown that 500x prompt token compression is possible (ACL'25, Li, Su, and Collier).

Without…

N8 Programs

@N8Programs

Oct 22

If GPT-5 Pro was the smartest models could ever get, we'd still have a tool incredibly useful for research.

prinz

@deredleritt3r

Oct 22

Here is yet another case of an expert mathematician using GPT-5 Pro as a tool in a lengthy, iterative process of solving an open problem (this time, in convex optimization). GPT-5 Pro did not come up with a solution from a single prompt. The arguments it generated were wrong 80%…