Skip to main content
Win Tun Lin avatar

Win Tun Lin has completed

Efficient AI Model Training with PyTorch

Start course For Free
4 hr
3,850 XP
Statement of Accomplishment Badge

Loved by learners at thousands of companies


Course Description

Distributed training is an essential skill in large-scale machine learning, helping you to reduce the time required to train large language models with trillions of parameters. In this course, you will explore the tools, techniques, and strategies essential for efficient distributed training using PyTorch, Accelerator, and Trainer.

Preparing Data for Distributed Training

You'll begin by preparing data for distributed training by splitting datasets across multiple devices and deploying model copies to each device. You'll gain hands-on experience in preprocessing data for distributed environments, including images, audio, and text.

Exploring Efficiency Techniques

Once your data is ready, you'll explore ways to improve efficiency in training and optimizer use across multiple interfaces. You'll see how to address these challenges by improving memory usage, device communication, and computational efficiency with techniques like gradient accumulation, gradient checkpointing, local stochastic gradient descent, and mixed precision training. You'll understand the tradeoffs between different optimizers to help you decrease your model's memory footprint. By the end of this course, you'll be equipped with the knowledge and tools to build distributed AI-powered services.
For Business

Training 2 or more people?

Get your team access to the full DataCamp platform, including all the features.
DataCamp for BusinessFor a bespoke solution book a demo.
  1. 1

    Data Preparation with Accelerator

    Free

    You'll prepare data for distributed training by splitting the data across multiple devices and copying the model on each device. Accelerator provides a convenient interface for data preparation, and you'll learn how to preprocess images, audio, and text as a first step in distributed training.

    Play Chapter Now
    Prepare models with AutoModel and Accelerator
    50 xp
    Loading and inspecting pre-trained models
    100 xp
    Automatic device placement with Accelerator
    100 xp
    Preprocess images and audio for training
    50 xp
    Preprocess image datasets
    100 xp
    Preprocess audio datasets
    100 xp
    Prepare datasets for distributed training
    100 xp
    Preprocess text for training
    50 xp
    Preprocess text with AutoTokenizer
    100 xp
    Save and load the state of preprocessed text
    100 xp
  2. 2

    Distributed Training with Accelerator and Trainer

    In distributed training, each device trains on its data in parallel. You'll investigate two methods for distributed training: Accelerator enables custom training loops, and Trainer simplifies the interface for training.

    Play Chapter Now
  3. 3

    Improving Training Efficiency

    Distributed training strains resources with large models and datasets, but you can address these challenges by improving memory usage, device communication, and computational efficiency. You'll discover the techniques of gradient accumulation, gradient checkpointing, local stochastic gradient descent, and mixed precision training.

    Play Chapter Now
  4. 4

    Training with Efficient Optimizers

    You'll focus on optimizers as levers to improve distributed training efficiency, highlighting tradeoffs between AdamW, Adafactor, and 8-bit Adam. Reducing the number of parameters or using low precision helps to decrease a model's memory footprint.

    Play Chapter Now
For Business

Training 2 or more people?

Get your team access to the full DataCamp platform, including all the features.

datasets

Audio datasetCrop imageAgricultural QA datasetMRPC dataset

collaborators

Collaborator's avatar
James Chapman
Collaborator's avatar
Jasmin Ludolf
Collaborator's avatar
Francesca Donadoni
Dennis Lee HeadshotDennis Lee

Software Engineer at Amazon

Dennis is passionate about simplifying science and technology for everyone. He is a software engineer at Amazon, optimizing supply chain networks. He has experience across software engineering, data science, and data engineering in various industries from management consulting to operations. He earned his Ph.D. in Electrical and Computer Engineering.
See More

Join over 18 million learners and start Efficient AI Model Training with PyTorch today!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.