DeepSpeed

@DeepSpeedAI

Official account for DeepSpeed, a library that enables unprecedented scale and speed for deep learning training + inference. 日本語 : @DeepSpeedAI_JP

Science & Technology

deepspeed.ai

Tham gia vào Tháng 5 2020

92Bài đăng 4KNgười theo dõi 89Đang theo dõi

Bạn có thể thích

@RekaAILabs

@arena

@janleike

@tri_dao

@DbrxMosaicAI

@suchenzang

@hadisalmanX

@hwchase17

@jerryjliu0

@WenhuChen

@llama_index

@YiTayML

@ggerganov

@julien_c

@LiorOnAI

Ghim

DeepSpeed

@DeepSpeedAI

9 thg 10

UIUC, AnyScale, and Snowflake significantly enhanced LLM offloading for the Superchip era!

🚀 SuperOffload: Unleashing the Power of Large-Scale LLM Training on Superchips Superchips like the NVIDIA GH200 offer tightly coupled GPU-CPU architectures for AI workloads. But most existing offloading techniques were designed for traditional PCIe-based systems. Are we truly…

_Minjia_Zhang_'s tweet image. 🚀 SuperOffload: Unleashing the Power of Large-Scale LLM Training on Superchips

Superchips like the NVIDIA GH200 offer tightly coupled GPU-CPU architectures for AI workloads. But most existing offloading techniques were designed for traditional PCIe-based systems. Are we truly…

DeepSpeed

@DeepSpeedAI

26 thg 10

It's nice to share the most recent updates from the DeepSpeed project at #PyTorchCon, we will continue pushing the boundary of LLM distributed training for the OSS community.

PyTorch

@PyTorch

4 thg 9

🎙️ Mic check: Tunji Ruwase, Lead, DeepSpeed Project & Principal Engineer at Snowflake, is bringing the 🔥 to the keynote stage at #PyTorchCon! Get ready for big ideas and deeper learning October 22–23 in San Francisco. 👀 Speakers: hubs.la/Q03GPYFn0 🎟️…

PyTorch's tweet image. 🎙️ Mic check: Tunji Ruwase, Lead, DeepSpeed Project &amp; Principal Engineer at Snowflake, is bringing the 🔥 to the keynote stage at #PyTorchCon!

Get ready for big ideas and deeper learning October 22–23 in San Francisco.

👀 Speakers: hubs.la/Q03GPYFn0
🎟️…

DeepSpeed đã đăng lại

Anyscale

@anyscalecompute

6 thg 10

🚨Meetup Alert🚨 Join us for @raydistributed × @DeepSpeedAI Meetup: AI at Scale, including talks from researchers and engineers at @LinkedIn, @anyscalecompute and @Snowflake. Learn how leading AI teams are scaling efficiently with Ray’s distributed framework and DeepSpeed’s…

DeepSpeed

@DeepSpeedAI

9 thg 9

Step into the future of AI at #PyTorchCon 2025, Oct 22–23 in San Francisco 🔥 Join the DeepSpeed keynote and technical talks. Register: events.linuxfoundation.org/pytorch-confer… + Oct 21 co-located events: Measuring Intelligence, Open Agent & AI Infra Summits / Startup Showcase & PyTorch Training

DeepSpeedAI's tweet card. PyTorch Conference 2025: Join AI leaders, ML engineers & researchers in San Francisco, Oct 22-23. Experience the future of machine learning & deep learning.

PyTorch Conference | LF Events

Nguồn: events.linuxfoundation.org

DeepSpeed đã đăng lại

Stas Bekman

@StasBekman

21 thg 8

The @DeepSpeedAI would like to thank @modal for sponsoring our gpus for CI. This is an amazing contribution to our AI-democratizing open source project. github.com/deepspeedai/De… The Modal team is outstanding in their amazing support - speed, expertise and a human experience!

StasBekman's tweet card. DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective. - deepspeedai/DeepSpeed

DeepSpeed/README.md at master · deepspeedai/DeepSpeed

Nguồn: github.com

DeepSpeed

@DeepSpeedAI

21 thg 8

ZenFlow is a massive improvement to DeepSpeed Offloading. Courtesy of an excellent collaboration among University of Virginia, UC Merced, Argonne National Laboratory, Microsoft, and Snowflake.

PyTorch

@PyTorch

20 thg 8

Introducing #ZenFlow: No Compromising Speed for #LLM Training w/ Offloading 5× faster LLM training with offloading 85% less GPU stalls 2× lower I/O overhead 🚀 Blog: hubs.la/Q03DJ6GJ0 🚀 Try ZenFlow and experience 5× faster training with offloading: hubs.la/Q03DJ6Vb0

PyTorch's tweet image. Introducing #ZenFlow: No Compromising Speed for #LLM Training w/ Offloading
5× faster LLM training with offloading
85% less GPU stalls
2× lower I/O overhead
🚀 Blog: hubs.la/Q03DJ6GJ0
🚀 Try ZenFlow and experience 5× faster training with offloading: hubs.la/Q03DJ6Vb0

DeepSpeed

@DeepSpeedAI

10 thg 7

Kudos to Xinyu for giving an excellent presentation of DeepSpeed Universal Checkpointing (UCP) paper at USENIX ATC 2015.

Minjia Zhang

@_Minjia_Zhang_

10 thg 7

📢 Yesterday at USENIX ATC 2025, Xinyu Lian from UIUC SSAIL Lab presented our paper on Universal Checkpointing (UCP). UCP is a new distributed checkpointing system designed for today's large-scale DNN training, where models often use complex forms of parallelism, including data,…

_Minjia_Zhang_'s tweet image. 📢 Yesterday at USENIX ATC 2025, Xinyu Lian from UIUC SSAIL Lab presented our paper on Universal Checkpointing (UCP). UCP is a new distributed checkpointing system designed for today's large-scale DNN training, where models often use complex forms of parallelism, including data,…

DeepSpeed đã đăng lại

Stas Bekman

@StasBekman

24 thg 6

My first project at @Snowflake AI Research is complete! I present to you Arctic Long Sequence Training (ALST) Paper: arxiv.org/abs/2506.13996 Blog: snowflake.com/en/engineering… ALST is a set of modular, open-source techniques that enable training on sequences up to 15 million…

StasBekman's tweet image. My first project at @Snowflake AI Research is complete!

I present to you Arctic Long Sequence Training (ALST)

Paper: arxiv.org/abs/2506.13996
Blog: snowflake.com/en/engineering…

ALST is a set of modular, open-source techniques that enable training on sequences up to 15 million…

DeepSpeed

@DeepSpeedAI

16 thg 6

Improved DeepNVMe: Affordable I/O Scaling for AI - Faster I/O with PCIe Gen5 - 20x faster model checkpointing - Low-budget SGLang inference via NVMe offloading - Pinned memory for CPU-only workloads - Zero-copy tensor type casting Blog: tinyurl.com/yanbrjy9

DeepSpeedAI's tweet image. Improved DeepNVMe: Affordable I/O Scaling for AI

- Faster I/O with PCIe Gen5
- 20x faster model checkpointing
- Low-budget SGLang inference via NVMe offloading
- Pinned memory for CPU-only workloads
- Zero-copy tensor type casting

Blog: tinyurl.com/yanbrjy9

DeepSpeed đã đăng lại

PyTorch

@PyTorch

7 thg 5

PyTorch Foundation has expanded into an umbrella foundation. @vllm_project and @DeepSpeedAI have been accepted as hosted projects, advancing community-driven AI across the full lifecycle. Supporting quotes provided by the following members: @AMD, @Arm, @AWS, @Google, @Huawei,…

DeepSpeed

@DeepSpeedAI

3 thg 5

Come hear all the exciting DeepSpeed updates at the upcoming PyTorch Day France 2025 DeepSpeed – Efficient Training Scalability for Deep Learning Models - sched.co/21nyy @sched

DeepSpeed

@DeepSpeedAI

16 thg 4

Introducing 🚀DeepCompile🚀: compiler-based distributed training optimizations. - Automatic parallelization & profile-guided optimizations - Enable ZeRO1, ZeRO3, Offloading, etc. via compiler passes - 1.2X-7X speedups over manual ZeRO1/ZeRO3/Offloading tinyurl.com/8cys28xk

DeepSpeed

@DeepSpeedAI

1 thg 4

AutoTP + ZeRO Training for HF Models - Enhance HF post-training with larger models, batches, & contexts - 4x faster LLAMA3 fine-tuning with TP=2 vs TP=1 - No code changes needed Blog: tinyurl.com/5n8nfs2w

DeepSpeedAI's tweet image. AutoTP + ZeRO Training for HF Models
- Enhance HF post-training with larger models, batches, &amp; contexts
- 4x faster LLAMA3 fine-tuning with TP=2 vs TP=1
- No code changes needed

Blog: tinyurl.com/5n8nfs2w

DeepSpeed đã đăng lại

xr-5 🐀

@xariusrke

24 thg 2

1/4⚡️nanoton now supports DoMiNo with intra-layer communication overlapping, achieving 60% communication hiding for tensor parallelism (TP) in both the forward and backward passes while maintaining the same training loss.

xariusrke's tweet image. 1/4⚡️nanoton now supports DoMiNo with intra-layer communication overlapping, achieving 60% communication hiding for tensor parallelism (TP) in both the forward and backward passes while maintaining the same training loss.

DeepSpeed đã đăng lại

LF AI & Data Foundation

@LFAIDataFdn

3 thg 2

🚀 Excited to introduce DeepSpeed, a deep learning optimization library from @Microsoft! It simplifies distributed training and inference, making AI scaling more efficient and cost-effective. Learn more 👉 hubs.la/Q0351DJC0 #DeepSpeed #AI #OpenSource #LFAIData

DeepSpeed

@DeepSpeedAI

5 thg 12

🚀Introducing Ulysses-Offload🚀 - Unlock the power of long context LLM training and finetuning with our latest system optimizations - Train LLaMA3-8B on 2M tokens context using 4xA100-80GB - Achieve over 55% MFU Blog: shorturl.at/Spx6Y Tutorial: shorturl.at/bAWu5

DeepSpeed

@DeepSpeedAI

25 thg 11, 2024

Introducing Domino: a novel zero-cost communication tensor parallelism (TP) training engine for both single node and multi-node settings. - Near-complete communication hiding - Novel multi-node scalable TP solution Blog: github.com/microsoft/Deep…

DeepSpeed

@DeepSpeedAI

21 thg 8, 2024

Great to see the amazing DeepSpeed optimizations from @Guanhua_Wang_, Heyang Qin, @toh_tana, @QuentinAnthon15, and @samadejacobs presented by @ammar_awan at MUG '24.

MVAPICH

@mvapich

20 thg 8, 2024

Dr. Ammar Ahmad Awan from Microsoft DeepSpeed giving a presentation at MUG '24 over Trillion-parameter LLMs and optimization with MVAPICH. @OSUengineering @Microsoft @OhTechCo @mvapich @MSFTDeepSpeed @MSFTDeepSpeedJP #MUG24 #MPI #AI #LLM #DeepSpeed

mvapich's tweet image. Dr. Ammar Ahmad Awan from Microsoft DeepSpeed giving a presentation at MUG '24 over Trillion-parameter LLMs and optimization with MVAPICH.

@OSUengineering @Microsoft
@OhTechCo @mvapich
@MSFTDeepSpeed
@MSFTDeepSpeedJP #MUG24 #MPI #AI #LLM #DeepSpeed

DeepSpeed

@DeepSpeedAI

19 thg 8, 2024

Announcing that DeepSpeed now runs natively on Windows. This exciting combination unlocks DeepSpeed optimizations to Windows users and empowers more people and organizations with AI innovations. - HF Inference & Finetuning - LoRA - CPU Offload Blog: shorturl.at/a7TF8

DeepSpeedAI's tweet image. Announcing that DeepSpeed now runs natively on Windows. This exciting combination unlocks DeepSpeed optimizations to Windows users and empowers more people and organizations with AI innovations.
- HF Inference &amp; Finetuning
- LoRA
- CPU Offload

Blog: shorturl.at/a7TF8

DeepSpeed đã đăng lại

Comet

@Cometml

7 thg 8, 2024

💡Check out Comet’s latest integration with DeepSpeed, a deep learning optimization library! 🤝With the @MSFTDeepSpeed + @Cometml integration automatically start logging training metrics generated by DeepSpeed. Try the quick-start Colab to get started: colab.research.google.com/github/comet-m…