
Loved by learners at thousands of companies
Course Description
Distributed training is an essential skill in large-scale machine learning, helping you to reduce the time required to train large language models with trillions of parameters. In this course, you will explore the tools, techniques, and strategies essential for efficient distributed training using PyTorch, Accelerator, and Trainer.
Preparing Data for Distributed Training
You'll begin by preparing data for distributed training by splitting datasets across multiple devices and deploying model copies to each device. You'll gain hands-on experience in preprocessing data for distributed environments, including images, audio, and text.Exploring Efficiency Techniques
Once your data is ready, you'll explore ways to improve efficiency in training and optimizer use across multiple interfaces. You'll see how to address these challenges by improving memory usage, device communication, and computational efficiency with techniques like gradient accumulation, gradient checkpointing, local stochastic gradient descent, and mixed precision training. You'll understand the tradeoffs between different optimizers to help you decrease your model's memory footprint. By the end of this course, you'll be equipped with the knowledge and tools to build distributed AI-powered services.Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.- 1
Data Preparation with Accelerator
FreeYou'll prepare data for distributed training by splitting the data across multiple devices and copying the model on each device. Accelerator provides a convenient interface for data preparation, and you'll learn how to preprocess images, audio, and text as a first step in distributed training.
Prepare models with AutoModel and Accelerator50 xpLoading and inspecting pre-trained models100 xpAutomatic device placement with Accelerator100 xpPreprocess images and audio for training50 xpPreprocess image datasets100 xpPreprocess audio datasets100 xpPrepare datasets for distributed training100 xpPreprocess text for training50 xpPreprocess text with AutoTokenizer100 xpSave and load the state of preprocessed text100 xp - 2
Distributed Training with Accelerator and Trainer
In distributed training, each device trains on its data in parallel. You'll investigate two methods for distributed training: Accelerator enables custom training loops, and Trainer simplifies the interface for training.
Fine-tune models with Trainer50 xpDefine evaluation metrics100 xpSpecify the TrainingArguments100 xpSet up the Trainer100 xpTrain models with Accelerator50 xpPrepare a model for distributed training100 xpTraining loops before and after Accelerator100 xpBuilding a training loop with Accelerator100 xpEvaluate models with Accelerator50 xpSetting the model in evaluation mode100 xpLogging evaluation metrics100 xp - 3
Improving Training Efficiency
Distributed training strains resources with large models and datasets, but you can address these challenges by improving memory usage, device communication, and computational efficiency. You'll discover the techniques of gradient accumulation, gradient checkpointing, local stochastic gradient descent, and mixed precision training.
Gradient accumulation50 xpGradient accumulation with Accelerator100 xpGradient accumulation with Trainer100 xpGradient checkpointing and local SGD50 xpGradient checkpointing with Accelerator100 xpGradient checkpointing with Trainer100 xpLocal SGD with Accelerator100 xpMixed precision training50 xpMixed precision training with basic PyTorch100 xpMixed precision training with Accelerator100 xpMixed precision training with Trainer100 xp - 4
Training with Efficient Optimizers
You'll focus on optimizers as levers to improve distributed training efficiency, highlighting tradeoffs between AdamW, Adafactor, and 8-bit Adam. Reducing the number of parameters or using low precision helps to decrease a model's memory footprint.
Balanced training with AdamW50 xpAdamW with Trainer100 xpAdamW with Accelerator100 xpCompute the optimizer size100 xpMemory-efficient training with Adafactor50 xpAdafactor with Trainer100 xpAdafactor with Accelerator100 xpMixed precision training with 8-bit Adam50 xpSet up the 8-bit Adam optimizer100 xp8-bit Adam with Trainer100 xp8-bit Adam with Accelerator100 xpWhich optimizer is it?100 xpCongratulations!50 xp
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.collaborators


Software Engineer at Amazon
Dennis is passionate about simplifying science and technology for everyone. He is a software engineer at Amazon, optimizing supply chain networks. He has experience across software engineering, data science, and data engineering in various industries from management consulting to operations. He earned his Ph.D. in Electrical and Computer Engineering.
Join over 18 million learners and start Efficient AI Model Training with PyTorch today!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.