深受数千家公司学习者的喜爱
培训2人或更多?
试用DataCamp for Business课程描述
Preparing Data for Distributed Training
You'll begin by preparing data for distributed training by splitting datasets across multiple devices and deploying model copies to each device. You'll gain hands-on experience in preprocessing data for distributed environments, including images, audio, and text.Exploring Efficiency Techniques
Once your data is ready, you'll explore ways to improve efficiency in training and optimizer use across multiple interfaces. You'll see how to address these challenges by improving memory usage, device communication, and computational efficiency with techniques like gradient accumulation, gradient checkpointing, local stochastic gradient descent, and mixed precision training. You'll understand the tradeoffs between different optimizers to help you decrease your model's memory footprint. By the end of this course, you'll be equipped with the knowledge and tools to build distributed AI-powered services.先决条件
Intermediate Deep Learning with PyTorchWorking with Hugging Face1
Data Preparation with Accelerator
You'll prepare data for distributed training by splitting the data across multiple devices and copying the model on each device. Accelerator provides a convenient interface for data preparation, and you'll learn how to preprocess images, audio, and text as a first step in distributed training.
2
Distributed Training with Accelerator and Trainer
In distributed training, each device trains on its data in parallel. You'll investigate two methods for distributed training: Accelerator enables custom training loops, and Trainer simplifies the interface for training.
3
Improving Training Efficiency
Distributed training strains resources with large models and datasets, but you can address these challenges by improving memory usage, device communication, and computational efficiency. You'll discover the techniques of gradient accumulation, gradient checkpointing, local stochastic gradient descent, and mixed precision training.
4
Training with Efficient Optimizers
You'll focus on optimizers as levers to improve distributed training efficiency, highlighting tradeoffs between AdamW, Adafactor, and 8-bit Adam. Reducing the number of parameters or using low precision helps to decrease a model's memory footprint.
Efficient AI Model Training with PyTorch
课程完成 通过 DataCamp for Mobile 提升您的数据技能
随时随地通过我们的移动课程和每日 5 分钟编程挑战提升技能。