DeepSpeed

@MSFTDeepSpeed

Official account for @Microsoft DeepSpeed, a library that enables unprecedented scale and speed for deep learning training + inference. 日本語 : @MSFTDeepSpeedJP

Science & Technology

Redmond, WA

deepspeed.ai

Inscrit en Mai 2020

78Posts 4KAbonnés 88Abonnements

Vous pourriez aimer

@RekaAILabs

@arena

@janleike

@tri_dao

@suchenzang

@hadisalmanX

@hwchase17

@jerryjliu0

@WenhuChen

@llama_index

@YiTayML

@ggerganov

@julien_c

@LiorOnAI

@zhuohan123

Épinglé

DeepSpeed

@DeepSpeedAI

9 oct.

UIUC, AnyScale, and Snowflake significantly enhanced LLM offloading for the Superchip era!

🚀 SuperOffload: Unleashing the Power of Large-Scale LLM Training on Superchips Superchips like the NVIDIA GH200 offer tightly coupled GPU-CPU architectures for AI workloads. But most existing offloading techniques were designed for traditional PCIe-based systems. Are we truly…

_Minjia_Zhang_'s tweet image. 🚀 SuperOffload: Unleashing the Power of Large-Scale LLM Training on Superchips

Superchips like the NVIDIA GH200 offer tightly coupled GPU-CPU architectures for AI workloads. But most existing offloading techniques were designed for traditional PCIe-based systems. Are we truly…

DeepSpeed

@DeepSpeedAI

26 oct.

It's nice to share the most recent updates from the DeepSpeed project at #PyTorchCon, we will continue pushing the boundary of LLM distributed training for the OSS community.

PyTorch

@PyTorch

4 sept.

🎙️ Mic check: Tunji Ruwase, Lead, DeepSpeed Project & Principal Engineer at Snowflake, is bringing the 🔥 to the keynote stage at #PyTorchCon! Get ready for big ideas and deeper learning October 22–23 in San Francisco. 👀 Speakers: hubs.la/Q03GPYFn0 🎟️…

PyTorch's tweet image. 🎙️ Mic check: Tunji Ruwase, Lead, DeepSpeed Project &amp; Principal Engineer at Snowflake, is bringing the 🔥 to the keynote stage at #PyTorchCon!

Get ready for big ideas and deeper learning October 22–23 in San Francisco.

👀 Speakers: hubs.la/Q03GPYFn0
🎟️…

DeepSpeed a reposté

Anyscale

@anyscalecompute

6 oct.

🚨Meetup Alert🚨 Join us for @raydistributed × @DeepSpeedAI Meetup: AI at Scale, including talks from researchers and engineers at @LinkedIn, @anyscalecompute and @Snowflake. Learn how leading AI teams are scaling efficiently with Ray’s distributed framework and DeepSpeed’s…

DeepSpeed

@DeepSpeedAI

9 sept.

Step into the future of AI at #PyTorchCon 2025, Oct 22–23 in San Francisco 🔥 Join the DeepSpeed keynote and technical talks. Register: events.linuxfoundation.org/pytorch-confer… + Oct 21 co-located events: Measuring Intelligence, Open Agent & AI Infra Summits / Startup Showcase & PyTorch Training

DeepSpeedAI's tweet card. PyTorch Conference 2025: Join AI leaders, ML engineers & researchers in San Francisco, Oct 22-23. Experience the future of machine learning & deep learning.

PyTorch Conference | LF Events

Source: events.linuxfoundation.org

DeepSpeed a reposté

Stas Bekman

@StasBekman

21 août

The @DeepSpeedAI would like to thank @modal for sponsoring our gpus for CI. This is an amazing contribution to our AI-democratizing open source project. github.com/deepspeedai/De… The Modal team is outstanding in their amazing support - speed, expertise and a human experience!

StasBekman's tweet card. DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective. - deepspeedai/DeepSpeed

DeepSpeed/README.md at master · deepspeedai/DeepSpeed

Source: github.com

DeepSpeed

@DeepSpeedAI

21 août

ZenFlow is a massive improvement to DeepSpeed Offloading. Courtesy of an excellent collaboration among University of Virginia, UC Merced, Argonne National Laboratory, Microsoft, and Snowflake.

PyTorch

@PyTorch

20 août

Introducing #ZenFlow: No Compromising Speed for #LLM Training w/ Offloading 5× faster LLM training with offloading 85% less GPU stalls 2× lower I/O overhead 🚀 Blog: hubs.la/Q03DJ6GJ0 🚀 Try ZenFlow and experience 5× faster training with offloading: hubs.la/Q03DJ6Vb0

PyTorch's tweet image. Introducing #ZenFlow: No Compromising Speed for #LLM Training w/ Offloading
5× faster LLM training with offloading
85% less GPU stalls
2× lower I/O overhead
🚀 Blog: hubs.la/Q03DJ6GJ0
🚀 Try ZenFlow and experience 5× faster training with offloading: hubs.la/Q03DJ6Vb0

DeepSpeed

@DeepSpeedAI

10 juil.

Kudos to Xinyu for giving an excellent presentation of DeepSpeed Universal Checkpointing (UCP) paper at USENIX ATC 2015.

Minjia Zhang

@_Minjia_Zhang_

10 juil.

📢 Yesterday at USENIX ATC 2025, Xinyu Lian from UIUC SSAIL Lab presented our paper on Universal Checkpointing (UCP). UCP is a new distributed checkpointing system designed for today's large-scale DNN training, where models often use complex forms of parallelism, including data,…

_Minjia_Zhang_'s tweet image. 📢 Yesterday at USENIX ATC 2025, Xinyu Lian from UIUC SSAIL Lab presented our paper on Universal Checkpointing (UCP). UCP is a new distributed checkpointing system designed for today's large-scale DNN training, where models often use complex forms of parallelism, including data,…

DeepSpeed a reposté

Stas Bekman

@StasBekman

24 juin

My first project at @Snowflake AI Research is complete! I present to you Arctic Long Sequence Training (ALST) Paper: arxiv.org/abs/2506.13996 Blog: snowflake.com/en/engineering… ALST is a set of modular, open-source techniques that enable training on sequences up to 15 million…

StasBekman's tweet image. My first project at @Snowflake AI Research is complete!

I present to you Arctic Long Sequence Training (ALST)

Paper: arxiv.org/abs/2506.13996
Blog: snowflake.com/en/engineering…

ALST is a set of modular, open-source techniques that enable training on sequences up to 15 million…

DeepSpeed

@DeepSpeedAI

16 juin

Improved DeepNVMe: Affordable I/O Scaling for AI - Faster I/O with PCIe Gen5 - 20x faster model checkpointing - Low-budget SGLang inference via NVMe offloading - Pinned memory for CPU-only workloads - Zero-copy tensor type casting Blog: tinyurl.com/yanbrjy9

DeepSpeedAI's tweet image. Improved DeepNVMe: Affordable I/O Scaling for AI

- Faster I/O with PCIe Gen5
- 20x faster model checkpointing
- Low-budget SGLang inference via NVMe offloading
- Pinned memory for CPU-only workloads
- Zero-copy tensor type casting

Blog: tinyurl.com/yanbrjy9

DeepSpeed a reposté

PyTorch

@PyTorch

7 mai

PyTorch Foundation has expanded into an umbrella foundation. @vllm_project and @DeepSpeedAI have been accepted as hosted projects, advancing community-driven AI across the full lifecycle. Supporting quotes provided by the following members: @AMD, @Arm, @AWS, @Google, @Huawei,…

DeepSpeed

@DeepSpeedAI

3 mai

Come hear all the exciting DeepSpeed updates at the upcoming PyTorch Day France 2025 DeepSpeed – Efficient Training Scalability for Deep Learning Models - sched.co/21nyy @sched

DeepSpeed

@DeepSpeedAI

16 avr.

Introducing 🚀DeepCompile🚀: compiler-based distributed training optimizations. - Automatic parallelization & profile-guided optimizations - Enable ZeRO1, ZeRO3, Offloading, etc. via compiler passes - 1.2X-7X speedups over manual ZeRO1/ZeRO3/Offloading tinyurl.com/8cys28xk

DeepSpeed

@DeepSpeedAI

1 avr.

AutoTP + ZeRO Training for HF Models - Enhance HF post-training with larger models, batches, & contexts - 4x faster LLAMA3 fine-tuning with TP=2 vs TP=1 - No code changes needed Blog: tinyurl.com/5n8nfs2w

DeepSpeedAI's tweet image. AutoTP + ZeRO Training for HF Models
- Enhance HF post-training with larger models, batches, &amp; contexts
- 4x faster LLAMA3 fine-tuning with TP=2 vs TP=1
- No code changes needed

Blog: tinyurl.com/5n8nfs2w

DeepSpeed a reposté

xr-5 🐀

@xariusrke

24 févr.

1/4⚡️nanoton now supports DoMiNo with intra-layer communication overlapping, achieving 60% communication hiding for tensor parallelism (TP) in both the forward and backward passes while maintaining the same training loss.

xariusrke's tweet image. 1/4⚡️nanoton now supports DoMiNo with intra-layer communication overlapping, achieving 60% communication hiding for tensor parallelism (TP) in both the forward and backward passes while maintaining the same training loss.

DeepSpeed a reposté

LF AI & Data Foundation

@LFAIDataFdn

3 févr.

🚀 Excited to introduce DeepSpeed, a deep learning optimization library from @Microsoft! It simplifies distributed training and inference, making AI scaling more efficient and cost-effective. Learn more 👉 hubs.la/Q0351DJC0 #DeepSpeed #AI #OpenSource #LFAIData

DeepSpeed

@DeepSpeedAI

5 déc.

🚀Introducing Ulysses-Offload🚀 - Unlock the power of long context LLM training and finetuning with our latest system optimizations - Train LLaMA3-8B on 2M tokens context using 4xA100-80GB - Achieve over 55% MFU Blog: shorturl.at/Spx6Y Tutorial: shorturl.at/bAWu5

DeepSpeed

@DeepSpeedAI

25 nov.

Introducing Domino: a novel zero-cost communication tensor parallelism (TP) training engine for both single node and multi-node settings. - Near-complete communication hiding - Novel multi-node scalable TP solution Blog: github.com/microsoft/Deep…

DeepSpeed

@DeepSpeedAI

21 août 2024

Great to see the amazing DeepSpeed optimizations from @Guanhua_Wang_, Heyang Qin, @toh_tana, @QuentinAnthon15, and @samadejacobs presented by @ammar_awan at MUG '24.

MVAPICH

@mvapich

20 août 2024

Dr. Ammar Ahmad Awan from Microsoft DeepSpeed giving a presentation at MUG '24 over Trillion-parameter LLMs and optimization with MVAPICH. @OSUengineering @Microsoft @OhTechCo @mvapich @MSFTDeepSpeed @MSFTDeepSpeedJP #MUG24 #MPI #AI #LLM #DeepSpeed

mvapich's tweet image. Dr. Ammar Ahmad Awan from Microsoft DeepSpeed giving a presentation at MUG '24 over Trillion-parameter LLMs and optimization with MVAPICH.

@OSUengineering @Microsoft
@OhTechCo @mvapich
@MSFTDeepSpeed
@MSFTDeepSpeedJP #MUG24 #MPI #AI #LLM #DeepSpeed

DeepSpeed

@DeepSpeedAI

19 août 2024

Announcing that DeepSpeed now runs natively on Windows. This exciting combination unlocks DeepSpeed optimizations to Windows users and empowers more people and organizations with AI innovations. - HF Inference & Finetuning - LoRA - CPU Offload Blog: shorturl.at/a7TF8

DeepSpeedAI's tweet image. Announcing that DeepSpeed now runs natively on Windows. This exciting combination unlocks DeepSpeed optimizations to Windows users and empowers more people and organizations with AI innovations.
- HF Inference &amp; Finetuning
- LoRA
- CPU Offload

Blog: shorturl.at/a7TF8

DeepSpeed a reposté

Comet

@Cometml

7 août 2024

💡Check out Comet’s latest integration with DeepSpeed, a deep learning optimization library! 🤝With the @MSFTDeepSpeed + @Cometml integration automatically start logging training metrics generated by DeepSpeed. Try the quick-start Colab to get started: colab.research.google.com/github/comet-m…