logprob

@logprob

phd student, deep learning

一月 2019 加入

224帖子 143关注者 3K正在关注

你可能会喜欢

@jarekadam

@ctdays1

@HuyHuytq

@Herr_Hoe

@CryptoRoadNC

@Monk2400

@ayanujju

@bqp_equals_bpp

@zaky_ashraf

@Softuniverse2

@ernestoluis

@DavidBerdik

置顶

logprob

@logprob

2024年4月27日

activation functions drama from: arxiv.org/pdf/1606.08415

logprob

@logprob

年10月4日

i love the fact that the model keeps learning (valid loss decrease) even when the training loss (blue) seems to plateau. there is something very deep happening in that jiggling of the training loss

logprob's tweet image. i love the fact that the model keeps learning (valid loss decrease) even when the training loss (blue) seems to plateau. there is something very deep happening in that jiggling of the training loss

logprob 已转帖

I spent the past month reimplementing DeepMind’s Genie 3 world model from scratch Ended up making TinyWorlds, a 3M parameter world model capable of generating playable game environments demo below + everything I learned in thread (full repo at the end)👇🏼

logprob 已转帖

Google DeepMind

@GoogleDeepMind

年9月18日

We’re announcing a major advance in the study of fluid dynamics with AI 💧 in a joint paper with researchers from @BrownUniversity, @nyuniversity and @Stanford.

GoogleDeepMind's tweet image. We’re announcing a major advance in the study of fluid dynamics with AI 💧 in a joint paper with researchers from @BrownUniversity, @nyuniversity and @Stanford.

logprob 已转帖

Dr. Chris Rackauckas

@ChrisRackauckas

年9月4日

New blog post: Implicit ODE Solvers Are Not Universally More Robust than Explicit ODE Solvers, Or Why No ODE Solver is Best Talks about how "robust" methods can be less robust in practice. Justifies hundreds of methods in #julialang #sciml stochasticlifestyle.com/implicit-ode-s…

stochasticlifestyle.com

Implicit ODE Solvers Are Not Universally More Robust than Explicit ODE Solvers, Or Why No ODE...

A very common adage in ODE solvers is that if you run into trouble with an explicit method, usually some explicit Runge-Kutta method like RK4, then you should try an implicit method. Implicit...

来源: stochasticlifestyle.com

logprob 已转帖

François Rozet

@FrancoisRozet

年9月3日

Does a smaller latent space lead to worse generation in latent diffusion models? Not necessarily! We show that LDMs are extremely robust to a wide range of compression rates (10-1000x) in the context of physics emulation. We got lost in latent space. Join us 👇

FrancoisRozet's tweet image. Does a smaller latent space lead to worse generation in latent diffusion models? Not necessarily! We show that LDMs are extremely robust to a wide range of compression rates (10-1000x) in the context of physics emulation.

We got lost in latent space. Join us 👇

logprob 已转帖

Georgia Channing

@cgeorgiaw

年9月3日

Really cool to see how the bigger-doesn't-mean-better trends in language models also seem to hold up for science models Hope this means that computational physics will no longer be solely in the domain of enormous HPC clusters

François Rozet

@FrancoisRozet

年9月3日

logprob

@logprob

年9月2日

there is the urgent need for understanding the internal behavior of deep nets. (I am at a numerical analysis conference, tired of listening to talks blindly applying a random architecture to Navier Stokes with periodic boundary conditions)

logprob 已转帖

François Chollet

@fchollet

年8月3日

The big breakthrough for convnets was the first GPU-accelerated CUDA implementation, which immediately started winning first place in image classification competitions. Remember when that happened? I do. That was Dan Ciresan in 2011

Jürgen Schmidhuber

@SchmidhuberAI

年8月3日

Who invented convolutional neural networks (CNNs)? 1969: Fukushima had CNN-relevant ReLUs [2]. 1979: Fukushima had the basic CNN architecture with convolution layers and downsampling layers [1]. Compute was 100 x more costly than in 1989, and a billion x more costly than…

SchmidhuberAI's tweet image. Who invented convolutional neural networks (CNNs)?

1969: Fukushima had CNN-relevant ReLUs [2].

1979: Fukushima had the basic CNN architecture with convolution layers and downsampling layers [1]. Compute was 100 x more costly than in 1989, and a billion x more costly than…

logprob 已转帖

Analysis Fact

@AnalysisFact

年8月29日

'Mathematics compares the most diverse phenomena and discovers the secret analogies that unite them.' -- Joseph Fourier

logprob 已转帖

Beff – e/acc

@beffjezos

年8月28日

Every organism yearns to maximize its mutual information with the future. One way is to live forever, but the optimal way is to be fruitful (genetically and memetically) and multiply your bits (ideas and genes) so they take up more of the future lightcone.

logprob

@logprob

年8月17日

don't know how this LLM thing will be, I'd rather bet on SciML

logprob

@logprob

年8月17日

1 year later, still far away

logprob

@logprob

2024年8月20日

it'll be agi when it's able to do 3d tikz plots in latex

logprob 已转帖

will brown

@willccbb

年8月7日

i'm increasingly convinced that "transformative ai" is going to look like an abundance of specialized models for everything from drug design to weather sims to robotics to supply chains, not one agent to rule them all. we're going to need a lot more ai researchers

logprob

@logprob

年8月6日

humans are imitation machines, the base level of “intelligence” has already been achieved

David Deutsch

@DavidDeutschOxf

年8月4日

LLMs are trained to imitate patterns of language, not to discover or verify truth. So, when asked to speak as an expert in an area where perceived experts have a widespread misconception, the LLM will parrot that misconception, adopting the register and vocabulary of experts.

logprob

@logprob

年8月5日

it got navier-stokes

Matt McGill

@MattMcGill_

年8月5日

One nice thing you can do with an interactive world model, look down and see your footwear ... and if the model understands what puddles are. Genie 3 creation.

logprob 已转帖

Paul Graham

@paulg

年7月25日

The point of college is not to prepare you for "professional success." It's to improve your mind. Nor are these distinct paths. The people who achieve the greatest "professional success" are those who think of college as more than job training.

logprob 已转帖

Sebastian Raschka

@rasbt

年6月20日

Upgraded from Llama 3 to Qwen3 as my go-to model for research experiments, so I implemented qwen3 from scratch: github.com/rasbt/LLMs-fro… Trade-off: Qwen3 0.6B is deeper (28x vs 16x layers) & slower than the wider Llama 3 1B but more memory efficient due to fewer params

rasbt's tweet image. Upgraded from Llama 3 to Qwen3 as my go-to model for research experiments, so I implemented qwen3 from scratch: github.com/rasbt/LLMs-fro…

Trade-off: Qwen3 0.6B is deeper (28x vs 16x layers) &amp; slower than the wider Llama 3 1B but more memory efficient due to fewer params

logprob 已转帖

Mathurin Massias

@mathusmassias

年6月18日

New paper on the generalization of Flow Matching arxiv.org/abs/2506.03719 🤯 Why does flow matching generalize? Did you know that the flow matching target you're trying to learn **can only generate training points**? with @Qu3ntinB, Anne Gagneux & Rémi Emonet 👇👇👇

logprob 已转帖

Benjamin Todd

@ben_j_todd

年6月16日

Is exponentially accumulating errors what LeCun said all along? That part might be true, but it doesn't mean LLMs are 'doomed'. Dropping the error rate from 10% to 1% (per 10min) makes 10h tasks possible. In practice, the error rate has been halving every 4 months(!). In…

ben_j_todd's tweet image. Is exponentially accumulating errors what LeCun said all along?

That part might be true, but it doesn't mean LLMs are 'doomed'.

Dropping the error rate from 10% to 1% (per 10min) makes 10h tasks possible.

In practice, the error rate has been halving every 4 months(!).

In…

Benjamin Todd

@ben_j_todd

年6月15日

Why can AIs code for 1h but not 10h? A simple explanation: if there's a 10% chance of error per 10min step (say), the success rate is: 1h: 53% 4h: 8% 10h: 0.002% @tobyordoxford has tested this 'constant error rate' theory and shown it's a good fit for the data chance of…