logprob's profile picture. phd student, deep learning

logprob

@logprob

phd student, deep learning

置顶

activation functions drama from: arxiv.org/pdf/1606.08415

logprob's tweet image. activation functions drama
from: arxiv.org/pdf/1606.08415

i love the fact that the model keeps learning (valid loss decrease) even when the training loss (blue) seems to plateau. there is something very deep happening in that jiggling of the training loss

logprob's tweet image. i love the fact that the model keeps learning (valid loss decrease) even when the training loss (blue) seems to plateau. there is something very deep happening in that jiggling of the training loss

logprob 已转帖

I spent the past month reimplementing DeepMind’s Genie 3 world model from scratch Ended up making TinyWorlds, a 3M parameter world model capable of generating playable game environments demo below + everything I learned in thread (full repo at the end)👇🏼


logprob 已转帖

We’re announcing a major advance in the study of fluid dynamics with AI 💧 in a joint paper with researchers from @BrownUniversity, @nyuniversity and @Stanford.

GoogleDeepMind's tweet image. We’re announcing a major advance in the study of fluid dynamics with AI 💧 in a joint paper with researchers from @BrownUniversity, @nyuniversity and @Stanford.

logprob 已转帖

Does a smaller latent space lead to worse generation in latent diffusion models? Not necessarily! We show that LDMs are extremely robust to a wide range of compression rates (10-1000x) in the context of physics emulation. We got lost in latent space. Join us 👇

FrancoisRozet's tweet image. Does a smaller latent space lead to worse generation in latent diffusion models? Not necessarily! We show that LDMs are extremely robust to a wide range of compression rates (10-1000x) in the context of physics emulation.

We got lost in latent space. Join us 👇

logprob 已转帖

Really cool to see how the bigger-doesn't-mean-better trends in language models also seem to hold up for science models Hope this means that computational physics will no longer be solely in the domain of enormous HPC clusters

Does a smaller latent space lead to worse generation in latent diffusion models? Not necessarily! We show that LDMs are extremely robust to a wide range of compression rates (10-1000x) in the context of physics emulation. We got lost in latent space. Join us 👇

FrancoisRozet's tweet image. Does a smaller latent space lead to worse generation in latent diffusion models? Not necessarily! We show that LDMs are extremely robust to a wide range of compression rates (10-1000x) in the context of physics emulation.

We got lost in latent space. Join us 👇


there is the urgent need for understanding the internal behavior of deep nets. (I am at a numerical analysis conference, tired of listening to talks blindly applying a random architecture to Navier Stokes with periodic boundary conditions)


logprob 已转帖

The big breakthrough for convnets was the first GPU-accelerated CUDA implementation, which immediately started winning first place in image classification competitions. Remember when that happened? I do. That was Dan Ciresan in 2011

Who invented convolutional neural networks (CNNs)? 1969: Fukushima had CNN-relevant ReLUs [2]. 1979: Fukushima had the basic CNN architecture with convolution layers and downsampling layers [1]. Compute was 100 x more costly than in 1989, and a billion x more costly than…

SchmidhuberAI's tweet image. Who invented convolutional neural networks (CNNs)? 

1969: Fukushima had CNN-relevant ReLUs [2].

1979: Fukushima had the basic CNN architecture with convolution layers and downsampling layers [1]. Compute was 100 x more costly than in 1989, and a billion x more costly than…


logprob 已转帖

'Mathematics compares the most diverse phenomena and discovers the secret analogies that unite them.' -- Joseph Fourier


logprob 已转帖

Every organism yearns to maximize its mutual information with the future. One way is to live forever, but the optimal way is to be fruitful (genetically and memetically) and multiply your bits (ideas and genes) so they take up more of the future lightcone.


don't know how this LLM thing will be, I'd rather bet on SciML


1 year later, still far away

it'll be agi when it's able to do 3d tikz plots in latex



logprob 已转帖

i'm increasingly convinced that "transformative ai" is going to look like an abundance of specialized models for everything from drug design to weather sims to robotics to supply chains, not one agent to rule them all. we're going to need a lot more ai researchers


humans are imitation machines, the base level of “intelligence” has already been achieved

LLMs are trained to imitate patterns of language, not to discover or verify truth. So, when asked to speak as an expert in an area where perceived experts have a widespread misconception, the LLM will parrot that misconception, adopting the register and vocabulary of experts.



it got navier-stokes

One nice thing you can do with an interactive world model, look down and see your footwear ... and if the model understands what puddles are. Genie 3 creation.



logprob 已转帖

The point of college is not to prepare you for "professional success." It's to improve your mind. Nor are these distinct paths. The people who achieve the greatest "professional success" are those who think of college as more than job training.


logprob 已转帖

Upgraded from Llama 3 to Qwen3 as my go-to model for research experiments, so I implemented qwen3 from scratch: github.com/rasbt/LLMs-fro… Trade-off: Qwen3 0.6B is deeper (28x vs 16x layers) & slower than the wider Llama 3 1B but more memory efficient due to fewer params

rasbt's tweet image. Upgraded from Llama 3 to Qwen3 as my go-to model for research experiments, so I implemented qwen3 from scratch: github.com/rasbt/LLMs-fro…

Trade-off: Qwen3 0.6B is deeper (28x vs 16x layers) & slower than the wider Llama 3 1B but more memory efficient due to fewer params

logprob 已转帖

New paper on the generalization of Flow Matching arxiv.org/abs/2506.03719 🤯 Why does flow matching generalize? Did you know that the flow matching target you're trying to learn **can only generate training points**? with @Qu3ntinB, Anne Gagneux & Rémi Emonet 👇👇👇


logprob 已转帖

Is exponentially accumulating errors what LeCun said all along? That part might be true, but it doesn't mean LLMs are 'doomed'. Dropping the error rate from 10% to 1% (per 10min) makes 10h tasks possible. In practice, the error rate has been halving every 4 months(!). In…

ben_j_todd's tweet image. Is exponentially accumulating errors what LeCun said all along?

That part might be true, but it doesn't mean LLMs are 'doomed'. 

Dropping the error rate from 10% to 1% (per 10min) makes 10h tasks possible.

In practice, the error rate has been halving every 4 months(!).

In…

Why can AIs code for 1h but not 10h? A simple explanation: if there's a 10% chance of error per 10min step (say), the success rate is: 1h: 53% 4h: 8% 10h: 0.002% @tobyordoxford has tested this 'constant error rate' theory and shown it's a good fit for the data chance of…

ben_j_todd's tweet image. Why can AIs code for 1h but not 10h?

A simple explanation: if there's a 10% chance of error per 10min step (say), the success rate is:

1h: 53%
4h: 8%
10h: 0.002%

@tobyordoxford has tested this 'constant error rate' theory and shown it's a good fit for the data

chance of…


guessing how much energy does chatgpt burns to spit out those emojis


Loading...

Something went wrong.


Something went wrong.