#scalablemachinelearning search results

No results for "#scalablemachinelearning"

Anastasia

Dec 20

1/n) Scaling across massive datasets can be achieved via hierarchical or federated approaches. Localized models process partitioned subsets, updating global baselines incrementally.

JohnSnowLabs

@JohnSnowLabs

Dec 19

Learn how to scale AI with confidence: Schedule a demo: hubs.li/Q03YJcZX0 #ScalableAI #HealthcareAI #AIAdoption #LLM #ClinicalAI #JohnSnowLabs

JohnSnowLabs's tweet image. Learn how to scale AI with confidence:
Schedule a demo: hubs.li/Q03YJcZX0
#ScalableAI #HealthcareAI #AIAdoption #LLM #ClinicalAI #JohnSnowLabs

Robots Digest 🤖

@robotsdigest

Dec 19

They find power-law scaling between pretraining data and downstream manipulation error. Meaning you can now predict robot performance from data scale. That’s normal for LLMs. It’s new for robots.

robotsdigest's tweet image. They find power-law scaling between pretraining data and downstream manipulation error.
Meaning you can now predict robot performance from data scale.
That’s normal for LLMs.
It’s new for robots.

They derive a interesting scaling law for the method where the failure rate drops exponentially as you scale candidates N and/or comparisons per match K. This is under two assumptions: the LLM generates a correct solution with non-zero probability and ...

BhavinJawade's tweet image. They derive a interesting scaling law for the method where the failure rate drops exponentially as you scale candidates N and/or comparisons per match K. This is under two assumptions: the LLM generates a correct solution with non-zero probability and ...

Aran Komatsuzaki

@arankomatsuzaki

Sep 19

Pre-training under infinite compute • Data, not compute, is the new bottleneck • Standard recipes overfit → fix with strong regularization (30× weight decay) • Scaling laws: loss decreases monotonically, best measured by asymptote not fixed budget

arankomatsuzaki's tweet image. Pre-training under infinite compute

• Data, not compute, is the new bottleneck
• Standard recipes overfit → fix with strong regularization (30× weight decay)
• Scaling laws: loss decreases monotonically, best measured by asymptote not fixed budget

Deedy

@deedydas

Aug 1

This is the best public resource on scaling hardware for AI, and its free. "How to Scale Your Model" is the bible from Google DeepMind that covers the math, systems and scaling laws for LLM training and inference workloads. Approachable yet thorough. Absolute Must-read.

deedydas's tweet image. This is the best public resource on scaling hardware for AI, and its free.

"How to Scale Your Model" is the bible from Google DeepMind that covers the math, systems and scaling laws for LLM training and inference workloads.

Approachable yet thorough. Absolute Must-read.

Gaurav Sen

@gkcs_

Apr 24

Large Language Models are getting smarter AND smaller. This video breaks down how AI researchers have made this possible. 00:00 Overview 00:20 Scaling Law 1.0 01:48 Using smaller weights 03:44 Increasing model training 06:05 Google Titans 07:40 What's after Transformers? 08:45…

Jerry Tworek

@MillionInt

Apr 16

Scaling is incredibly hard and demanding and leaves very little room for error in every little part of the training stack But once it works, it's beautiful to see it

MillionInt's tweet image. Scaling is incredibly hard and demanding and leaves very little room for error in every little part of the training stack

But once it works, it's beautiful to see it

Kimi.ai

@Kimi_Moonshot

Feb 22

We’ve answered the three remaining open questions @kellerjordan0 posted at kellerjordan.github.io/posts/muon/: - Yes, Muon scales to larger compute. - Yes, Muon can be trained on a large cluster. - Yes, Muon works for SFT. RL probably works too, leave it to the audience as exercise :)

Kimi.ai

@Kimi_Moonshot

Feb 22

🚀 Introducing our new tech report: Muon is Scalable for LLM Training We found that Muon optimizer can be scaled up using the follow techniques: • Adding weight decay • Carefully adjusting the per-parameter update scale ✨ Highlights: • ~2x computational efficiency vs AdamW…

Kimi_Moonshot's tweet image. 🚀 Introducing our new tech report: Muon is Scalable for LLM Training

We found that Muon optimizer can be scaled up using the follow techniques:
• Adding weight decay
• Carefully adjusting the per-parameter update scale

✨ Highlights:
• ~2x computational efficiency vs AdamW…

Jim Fan

@DrJimFan

Sep 12, 2024

This may be the most important figure in LLM research since the OG Chinchilla scaling law in 2022. The key insight is 2 curves working in tandem. Not one. People have been predicting a stagnation in LLM capability by extrapolating the training scaling law, yet they didn't…

DrJimFan's tweet image. This may be the most important figure in LLM research since the OG Chinchilla scaling law in 2022. The key insight is 2 curves working in tandem. Not one.

People have been predicting a stagnation in LLM capability by extrapolating the training scaling law, yet they didn't…

Michael Poli

@MichaelPoli6

Mar 28, 2024

📢New research on mechanistic architecture design and scaling laws. - We perform the largest scaling laws analysis (500+ models, up to 7B) of beyond Transformer architectures to date - For the first time, we show that architecture performance on a set of isolated token…

MichaelPoli6's tweet image. 📢New research on mechanistic architecture design and scaling laws.

- We perform the largest scaling laws analysis (500+ models, up to 7B) of beyond Transformer architectures to date

- For the first time, we show that architecture performance on a set of isolated token…

AK

@_akhaliq

Mar 14, 2024

Simple and Scalable Strategies to Continually Pre-train Large Language Models Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes available. A much more efficient solution is to continually

_akhaliq's tweet image. Simple and Scalable Strategies to Continually Pre-train Large Language Models

Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes available. A much more efficient solution is to continually

AK

@_akhaliq

Jan 2, 2024

MosaicML announces Beyond Chinchilla-Optimal Accounting for Inference in Language Model Scaling Laws paper page: huggingface.co/papers/2401.00… Large language model (LLM) scaling laws are empirical formulas that estimate changes in model quality as a result of increasing parameter…

_akhaliq's tweet image. MosaicML announces Beyond Chinchilla-Optimal

Accounting for Inference in Language Model Scaling Laws

paper page: huggingface.co/papers/2401.00…

Large language model (LLM) scaling laws are empirical formulas that estimate changes in model quality as a result of increasing parameter…

Aran Komatsuzaki

@arankomatsuzaki

Jan 2, 2024

Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws Modifies the scaling laws to calculate the optimal LLM parameter count and pre-training data size to train and deploy a model of a given quality and inference demand arxiv.org/abs/2401.00448

arankomatsuzaki's tweet image. Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws

Modifies the scaling laws to calculate the optimal LLM parameter count and pre-training data size to train and deploy a model of a given quality and inference demand

arxiv.org/abs/2401.00448

Alex Ker 🔭

@thealexker

Nov 29, 2023

Can we predict the future capabilities of AI? . @ilyasut, co-founder and chief scientist of OpenAI, on the scaling law and an interesting area of research: (1) The scaling law relates the input of the neural network to simple-to-evaluate performance measures like next-word…

Hyung Won Chung

@hwchung27

Oct 11, 2023

A counterintuitive implication of scale: trying to solve a more general version of the problem is an easier way to solve the original problem than directly tackling it. Attempting a more general problem encourages you to come up with a more general and simpler approach. This…

Sebastian Raschka

@rasbt

Jul 7, 2023

We had 1) the RMT paper on scaling Transformers to 1M tokens, and 2) the convolutional Hyena LLM for 1M tokens. How about 3) LONGNET: Scaling Transformers to 1 Billion Tokens (arxiv.org/abs/2307.02486)!? It achieves linear (vs quadratic) scaling via dilated (vs self) attention.

rasbt's tweet image. We had 1) the RMT paper on scaling Transformers to 1M tokens, and 2) the convolutional Hyena LLM for 1M tokens.

How about 3) LONGNET: Scaling Transformers to 1 Billion Tokens (arxiv.org/abs/2307.02486)!?

It achieves linear (vs quadratic) scaling via dilated (vs self) attention.

Reluctant Quant

@DrMattCrowson

Oct 1, 2021

RT Machine Learning on Graphs, Part 3 dlvr.it/S8lLdv #graphmachinelearning #scalablemachinelearning #graphkernels

DrMattCrowson's tweet image. RT Machine Learning on Graphs, Part 3 dlvr.it/S8lLdv #graphmachinelearning #scalablemachinelearning #graphkernels

Lilian Weng

@lilianweng

Sep 26, 2021

Training large models demands a lot of GPU memory and a long training time. With several training parallelism strategies and a variety of memory saving designs, it is possible to train very large neural networks across many GPUs. lilianweng.github.io/lil-log/2021/0…

Reluctant Quant

@DrMattCrowson

Sep 3, 2021

RT Explicit feature maps for non-linear kernel functions dlvr.it/S6t26D #scikitlearn #machinelearning #scalablemachinelearning #kerneltrick

DrMattCrowson's tweet image. RT Explicit feature maps for non-linear kernel functions dlvr.it/S6t26D #scikitlearn #machinelearning #scalablemachinelearning #kerneltrick

No results for "#scalablemachinelearning"

Something went wrong.

United States Trends

1. Steelers 75.9K posts
2. Mark Andrews 3,884 posts
3. Lamar 20.1K posts
4. Ravens 32.1K posts
5. Lions 85.9K posts
6. #Married2Med 3,131 posts
7. Derrick Henry 3,944 posts
8. Drake Maye 7,487 posts
9. Toya 5,310 posts
10. Jags 15.8K posts
11. #RHOP 5,573 posts
12. Henderson 11.5K posts
13. Broncos 37K posts
14. #HereWeGo 9,284 posts
15. #BaddiesUSA 5,317 posts
16. Goff 9,001 posts
17. Contreras 10.8K posts
18. Nicki Minaj 143K posts
19. Teslaa 3,293 posts
20. #Patriots 3,138 posts

#scalablemachinelearning search results

Anastasia

JohnSnowLabs

Robots Digest 🤖

Bhavin Jawade

Aran Komatsuzaki

Deedy

Gaurav Sen

Jerry Tworek

Kimi.ai

Kimi.ai

Jim Fan

Michael Poli

AK

AK

Aran Komatsuzaki

Alex Ker 🔭

Hyung Won Chung

Sebastian Raschka

Reluctant Quant

Lilian Weng

Reluctant Quant

United States Trends