#expectationmaximizationalgorithm resultados da pesquisa

Reinforcement Learning (RL) has long been the dominant method for fine-tuning, powering many state-of-the-art LLMs. Methods like PPO and GRPO explore in action space. But can we instead explore directly in parameter space? YES we can. We propose a scalable framework for…


Most RL for LLMs involves only 1 step of RL. It’s a contextual bandit problem and there’s no covariate shift because the state (question, instruction) is given. This has many implications, eg DAgger becomes SFT, and it is trivial to design Expectation Maximisation (EM) maximum…

NandoDF's tweet image. Most RL for LLMs involves only 1 step of RL. It’s a contextual bandit problem and there’s no covariate shift because the state (question, instruction) is given. This has many implications, eg DAgger becomes SFT, and it is trivial to design Expectation Maximisation (EM) maximum…
NandoDF's tweet image. Most RL for LLMs involves only 1 step of RL. It’s a contextual bandit problem and there’s no covariate shift because the state (question, instruction) is given. This has many implications, eg DAgger becomes SFT, and it is trivial to design Expectation Maximisation (EM) maximum…
NandoDF's tweet image. Most RL for LLMs involves only 1 step of RL. It’s a contextual bandit problem and there’s no covariate shift because the state (question, instruction) is given. This has many implications, eg DAgger becomes SFT, and it is trivial to design Expectation Maximisation (EM) maximum…

Neural Architecture Search - Automating Design of Neural Networks Predictably, AI not humans can architect the most performant model given a particular dataset. Neural Architecture Search (NAS) is an ML technique that automates the design of neural networks. The aim is to…

bindureddy's tweet image. Neural Architecture Search - Automating Design of Neural Networks

Predictably, AI not humans can architect the most performant model given a particular dataset. Neural Architecture Search (NAS) is an ML technique that automates the design of neural networks. 

The aim is to…

Using expectation-maximization to fit a simple mixture model, using tidy tools: rpubs.com/dgrtwo/em-tidy #rstats

drob's tweet image. Using expectation-maximization to fit a simple mixture model, using tidy tools: rpubs.com/dgrtwo/em-tidy #rstats

Since compute grows faster than the web, we think the future of pre-training lies in the algorithms that will best leverage ♾ compute We find simple recipes that improve the asymptote of compute scaling laws to be 5x data efficient, offering better perf w/ sufficient compute

kothasuhas's tweet image. Since compute grows faster than the web, we think the future of pre-training lies in the algorithms that will best leverage ♾ compute

We find simple recipes that improve the asymptote of compute scaling laws to be 5x data efficient, offering better perf w/ sufficient compute

Why does #compneuro need new learning methods? ANN models are usually trained with Gradient Descent (GD), which violates biological realities like Dale’s law and log-normal weights. Here we describe a superior learning algorithm for comp neuro: Exponentiated Gradients (EG)! 1/12

JHCornford's tweet image. Why does #compneuro need new learning methods? ANN models are usually trained with Gradient Descent (GD), which violates biological realities like Dale’s law and log-normal weights. Here we describe a superior learning algorithm for comp neuro: Exponentiated Gradients (EG)! 1/12

Brain-like learning with exponentiated gradients biorxiv.org/cgi/content/sh… #biorxiv_neursci



And we can call this a initial success. Entropy based injection of CoT tokens to tell the model to re-evaluate (o1 style) and inject entropy based on branching to arrive at the correct value. Argmax returns the expected "9.11 is greater than 9.9" This is L3.2 1B

_xjdr's tweet image. And we can call this a initial success. Entropy based injection of CoT tokens to tell the model to re-evaluate (o1 style) and inject entropy based on branching to arrive at the correct value. Argmax returns the expected "9.11 is greater than 9.9"

This is L3.2 1B

Struggling to use LLMs for creative tasks? @hammer_mt talks about the powerful "Evaluator-Optimizer" pattern with GEPA+@DSPyOSS to optimize prompts for fuzzy generative tasks where evals are informal and subjective. Checkout the full talk, prompt and executable notebook below!

LakshyAAAgrawal's tweet image. Struggling to use LLMs for creative tasks?

@hammer_mt talks about the powerful "Evaluator-Optimizer" pattern with GEPA+@DSPyOSS to optimize prompts for fuzzy generative tasks where evals are informal and subjective.

Checkout the full talk, prompt and executable notebook below!
LakshyAAAgrawal's tweet image. Struggling to use LLMs for creative tasks?

@hammer_mt talks about the powerful "Evaluator-Optimizer" pattern with GEPA+@DSPyOSS to optimize prompts for fuzzy generative tasks where evals are informal and subjective.

Checkout the full talk, prompt and executable notebook below!

📚 Blog Update|Building Consistent Profits: The Mathematical Concept of “Expectancy” If you want to move beyond “winning by chance” and build a repeatable trading edge, understanding the concept of expectancy is essential. In Fundora’s latest blog, we explain what expectancy is…

Fundora_global's tweet image. 📚 Blog Update|Building Consistent Profits: The Mathematical Concept of “Expectancy”

If you want to move beyond “winning by chance” and build a repeatable trading edge, understanding the concept of expectancy is essential.
In Fundora’s latest blog, we explain what expectancy is…

ML researchers just built a new ensemble technique. It even outperforms XGBoost, CatBoost, and LightGBM. Here's a complete breakdown (explained visually):


[RLHF] by Hand ✍️ Yesterday, Jan Leike (@janleike) announced he is joining #Anthropic to lead their "super-alignment" mission. He is the co-inventor of Reinforcement Learning with Human Feedback (#RLHF). How does RLHF work? [1] Given ↳ Reward Model (RM) ↳ Large Language…


*** The Most Interesting Statistical Algorithm *** [The EM Algorithm: a masterful juggler of hidden truths and incomplete data, dancing between expectation and maximization.] ~ The most interesting statistical algorithm can vary depending on who you ask and their specific…

LetIt_BNoted's tweet image. *** The Most Interesting Statistical Algorithm *** 

[The EM Algorithm: a masterful juggler of hidden truths and incomplete data, dancing between expectation and maximization.]

~ The most interesting statistical algorithm can vary depending on who you ask and their specific…

We employ a Metropolis-Hastings (MCMC) approximate sampler, which iteratively updates a generation by partially resampling new candidates, accepting with probability depending on pᵃ. To make this approach suitable for LLMs, our algorithm integrates Metropolis-Hastings into…


Instead of testing for bias from potential misspecification of model restrictions, it's nearly optimal to adapt, averaging restricted and unrestricted estimates using data-driven weights. shinyApp: lsun20.github.io/MissAdapt/ Paper: econometricsociety.org/publications/e…

ecmaEditors's tweet image. Instead of testing for bias from potential misspecification of model restrictions, it's nearly optimal to adapt,  averaging restricted and unrestricted estimates using data-driven weights. shinyApp: lsun20.github.io/MissAdapt/ Paper: econometricsociety.org/publications/e…

Past methods of increasing epochs or params N overfit w/ a fixed number of web tokens After regularizing with much higher weight decay, we instead find loss follows a clean power law. The best possible loss is the limit as N→♾, which we estimate via the scaling law asymptote

kothasuhas's tweet image. Past methods of increasing epochs or params N overfit w/ a fixed number of web tokens

After regularizing with much higher weight decay, we instead find loss follows a clean power law. The best possible loss is the limit as N→♾, which we estimate via the scaling law asymptote

Curious how it works? Check out this demo where the model solves a tricky probability problem.


A rough estimate actually doesn’t hurt---applying a "scaling law" or something similar, like the lightweight embedding-augmented linear proxy model we used for QLoRA (even cheaper than scaling-law methods! see arxiv.org/pdf/2505.01449, with an update coming soon), is super cheap…


Every day we’re pushing the frontier

GPT-5 Pro found a counterexample to the NICD-with-erasures majority optimality (Simons list, p.25). simons.berkeley.edu/sites/default/… At p=0.4, n=5, f(x) = sign(x_1-3x_2+x_3-x_4+3x_5) gives E|f(x)|=0.43024 vs best majority 0.42904.

PI010101's tweet image. GPT-5 Pro found a counterexample to the NICD-with-erasures majority optimality (Simons list, p.25).
simons.berkeley.edu/sites/default/…

At p=0.4, n=5, f(x) = sign(x_1-3x_2+x_3-x_4+3x_5) gives E|f(x)|=0.43024 vs best majority 0.42904.
PI010101's tweet image. GPT-5 Pro found a counterexample to the NICD-with-erasures majority optimality (Simons list, p.25).
simons.berkeley.edu/sites/default/…

At p=0.4, n=5, f(x) = sign(x_1-3x_2+x_3-x_4+3x_5) gives E|f(x)|=0.43024 vs best majority 0.42904.


dang I want in on the infinite stanford compute 🤤

Since compute grows faster than the web, we think the future of pre-training lies in the algorithms that will best leverage ♾ compute We find simple recipes that improve the asymptote of compute scaling laws to be 5x data efficient, offering better perf w/ sufficient compute

kothasuhas's tweet image. Since compute grows faster than the web, we think the future of pre-training lies in the algorithms that will best leverage ♾ compute

We find simple recipes that improve the asymptote of compute scaling laws to be 5x data efficient, offering better perf w/ sufficient compute


When the parent Z of X_i (i=1,2,3) is unobserved #latentVariable, X_i and the estimates of the parameters are no longer independent. We use #ExpectationMaximizationAlgorithm to solve maximum likelihood problem involving a #latentVariable. cs.cmu.edu/~lebanon/pub/b… #readingOfTheDay

dengyazhuo's tweet image. When the parent Z of X_i (i=1,2,3) is unobserved #latentVariable, X_i and the estimates of the parameters are no longer independent.  We use #ExpectationMaximizationAlgorithm to solve maximum likelihood problem involving a #latentVariable. cs.cmu.edu/~lebanon/pub/b…
#readingOfTheDay
dengyazhuo's tweet image. When the parent Z of X_i (i=1,2,3) is unobserved #latentVariable, X_i and the estimates of the parameters are no longer independent.  We use #ExpectationMaximizationAlgorithm to solve maximum likelihood problem involving a #latentVariable. cs.cmu.edu/~lebanon/pub/b…
#readingOfTheDay

Nenhum resultado para "#expectationmaximizationalgorithm"

When the parent Z of X_i (i=1,2,3) is unobserved #latentVariable, X_i and the estimates of the parameters are no longer independent. We use #ExpectationMaximizationAlgorithm to solve maximum likelihood problem involving a #latentVariable. cs.cmu.edu/~lebanon/pub/b… #readingOfTheDay

dengyazhuo's tweet image. When the parent Z of X_i (i=1,2,3) is unobserved #latentVariable, X_i and the estimates of the parameters are no longer independent.  We use #ExpectationMaximizationAlgorithm to solve maximum likelihood problem involving a #latentVariable. cs.cmu.edu/~lebanon/pub/b…
#readingOfTheDay
dengyazhuo's tweet image. When the parent Z of X_i (i=1,2,3) is unobserved #latentVariable, X_i and the estimates of the parameters are no longer independent.  We use #ExpectationMaximizationAlgorithm to solve maximum likelihood problem involving a #latentVariable. cs.cmu.edu/~lebanon/pub/b…
#readingOfTheDay

Loading...

Something went wrong.


Something went wrong.


United States Trends