Skip to main content
HomeCheat sheetsDeep Learning

Deep Learning with PyTorch Cheat Sheet

Learn everything you need to know about PyTorch in this convenient cheat sheet
Sep 2023  · 6 min read

PyTorch is an open-source machine learning library primarily developed by Facebook's AI Research Lab (FAIR). It is widely used for various machine learning and deep learning tasks, including neural network development, natural language processing (NLP), computer vision, and reinforcement learning. In this cheat sheet, learn all the fundamentals of working with PyTorch in one convenient location!

Deep Learning with PyTorch.png

Have this cheat sheet at your fingertips

Download PDF


  • PyTorch is one of the most popular deep learning frameworks, with a syntax similar to NumPy.
  • In the context of PyTorch, you can think of a Tensor as a NumPy array that can be run on a CPU or a GPU, and has a method for automatic differentiation (needed for backpropagation).
  • TorchText, TorchVision, and TorchAudio are Python packages that provide PyTorch with functionality for text, image, and audio data respectively
  • A neural network consists of neurons that are arranged into layers. Input values are passed to the first layer of neural networks. Each neuron has two properties: a weight and a bias. The output of a neuron in a neural network is a weighted sum of its inputs, plus the bias. The output is passed on to any connected neurons in the next layer, and this continues until the final layer of the network is reached.
  • An activation function is a transformation of the output from a neuron, and is used to introduce non-linearity into the calculations.
  • Backpropagation is an algorithm used to train neural networks by iteratively adjusting the weights and biases of each neuron.
  • Saturation is when the output from a neuron reaches a maximum or minimum value beyond which it cannot change. This can reduce learning performance, and an activation function such as ReLU may be needed to avoid the phenomenon.
  • The loss function quantifies the difference between the predicted output of a model and the actual target output
  • The optimizer is an algorithm to adjust the parameters (neuron weights and biases) of a neural network during the training process in order to minimize the loss function.
  • The learning rate controls the step size of the optimizer. If the learning rate is too low the optimization will take too long. If it is too high, the optimizer will not effectively minimize the loss function leading to poor predictions.
  • Momentum controls the inertia of the optimizer. If momentum is too low, the optimizer can get stuck at a local minimum and give the wrong answer. If it is too high, the optimizer can fail to converge and not give an answer.
  • Transfer learning is reusing a model trained on one task for a second similar task to accelerate the training process.
  • Fine-tuning is a type of transfer learning where early layers are frozen, and only the layers close to the output are trained.
  • Accuracy is a metric to determine how well a model fits a dataset. It quantifies the proportion of correctly predicted outcomes (either classifications or predictions) compared to the total number of data points in the dataset.

Importing PyTorch

# Import the top-level package for core functionality
import torch

# Import neural network functionality
from torch import nn

# Import functional programming tools
import torch.nn.functional as F

# Import optimization functionality
import torch.optim as optim

# Import dataset functions
from import TensorDataset, DataLoader

# Import evaluation metrics
import torchmetrics

Working with Tensors

# Create tensor from list with tensor()
tnsr = torch.tensor([1, 3, 6, 10])

# Get data type of tensor elements with .dtype
tnsr.dtype # Returns torch.int64

# Get dimensions of tensor with .Size()
tnsr.shape # Returns torch.Size([4])

# Get memory location of tensor with .device
tnsr.device # Returns cpu or gpu

# Create a tensor of zeros with zeros()
tnsr_zrs = torch.zeros(2, 3)

# Create a random tensor with rand()
tnsr_rndm = torch.rand(size=(3, 4)) # Tensor has 3 rows, 4 columns

Datasets and Dataloaders

# Create a dataset from a pandas DataFrame with TensorDataset()
X = df[feature_columns].values
y = df[target_column].values
dataset = TensorDataset(torch.tensor(X).float(), torch.tensor(y).float())

# Load the data in batches with DataLoader()
dataloader = DataLoader(dataset, batch_size=n, shuffle=True)


# One-hot encode categorical variables with one_hot()
F.one_hot(torch.tensor([0, 1, 2]), num_classes=3) # Returns tensor of 0s and 1s

Sequential Model Architecture

# Create a linear layer with m inputs, n outputs with Linear()
lnr = nn.Linear(m, n)

# Get weight of layer with .weight

# Get bias of layer with .bias

# Create a sigmoid activation layer for binary classification with Sigmoid()

# Create a softmax activation layer for multi-class classification with Softmax()

# Create a rectified linear unit activation layer to avoid saturation with ReLU()

# Create a leaky rectified linear unit activation layer to avoid saturation with LeakyReLU()

# Create a dropout layer to regularize and prevent overfitting with Dropout()

# Create a sequential model from layers
model = nn.Sequential(
    nn.Linear(n_features, i),
    nn.Linear(i, j),   # Input size must match output from previous layer
    nn.Linear(j, n_classes),
    nn.Softmax(dim=-1) # Activation layer comes last

Fitting a model and calculating loss

# Fit a model to input data with model where model is a variable created by, e.g., Sequential()
prediction = model(input_data).double()

# Get target values
actual = torch.tensor(target_values).double()

# Calculate the mean-squared error loss for regression with MSELoss()
mse_loss = nn.MSELoss()(prediction, actual) # Returns tensor(x) 

# Calculate the L1 loss for robust regression with SmoothL1Loss()
l1_loss = nn.SmoothL1Loss()(prediction, actual) # Returns tensor(x) 

# Calculate binary cross-entropy loss for binary classification with BCELoss()
bce_loss = nn.BCELoss()(prediction, actual) # Returns tensor(x) 

# Calculate cross-entropy loss for multi-class classification with CrossEntropyLoss()
ce_loss = nn.CrossEntropyLoss()(prediction, actual) # Returns tensor(x) 

# Calculate the gradients via backprogagation with .backward()

Working with Optimizers

# Create a stochastic gradient descent optimizer with SGD(), setting learning rate and momentum
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.95)

# Update neuron parameters with .step()

The Training Loop

# Set model to training mode
# Set a loss criterion and an optimizer
loss_criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.95)
# Loop over chunks of data in the training set
for data in dataloader:
    # Set the gradients to zero with .zero_grad()
    # Get features and targets for current chunk of data
    features, targets = data
    # Run a "forward pass" to fit the model to the data
    predictions = model(data)
    # Calculate loss
    loss = loss_criterion(predictions, targets)
    # Calculate gradients using backprogagation
    # Update the model parameters

The Evaluation Loop

# Set model to evaluation mode

# Create accuracy metric with Accuracy()
metric = torchmetrics.Accuracy(task="multiclass", num_classes=3)
# Loop of chunks of data in the validation set
for i, data in enumerate(dataloader, 0):
    # Get features and targets for current chunk of data
    features, targets = data
    # Run a "forward pass" to fit the model to the data
    predictions = model(data)
    # Calculate accuracy over the batch
    accuracy = metric(output, predictions.argmax(dim=-1))
# Calculate accuracy over all the validation data
accuracy = metric.compute()
print(f"Accuracy on all data: {accuracy}")
# Reset the metric for the next dataset (training or validation)

Transfer Learning and Fine-Tuning

# Save a layer of a model to a file with save(), 'layer.pth')

# Load a layer of a model from a file with load()
new_layer = torch.load('layer.pth')

# Freeze the weight for layer 0 with .requires_grad
for name, param in model.named_parameters():
    if name == "0.weight":
        param.requires_grad = False

10 Essential Python Skills All Data Scientists Should Master

All data scientists need expertise in Python, but which skills are the most important for them to master? Find out the ten most vital Python skills in the latest rundown.

Thaylise Nakamoto

9 min

The 7 Best Python Certifications For All Levels

Find out whether a Python certification is right for you, what the best options are, and the alternatives on offer in this comprehensive guide.
Matt Crabtree's photo

Matt Crabtree

18 min

A Complete Guide to Socket Programming in Python

Learn the fundamentals of socket programming in Python
Serhii Orlivskyi's photo

Serhii Orlivskyi

41 min

Textacy: An Introduction to Text Data Cleaning and Normalization in Python

Discover how Textacy, a Python library, simplifies text data preprocessing for machine learning. Learn about its unique features like character normalization and data masking, and see how it compares to other libraries like NLTK and spaCy.

Mustafa El-Dalil

5 min

Coding Best Practices and Guidelines for Better Code

Learn coding best practices to improve your programming skills. Explore coding guidelines for collaboration, code structure, efficiency, and more.
Amberle McKee's photo

Amberle McKee

26 min

Introduction to Activation Functions in Neural Networks

Learn to navigate the landscape of common activation functions—from the steadfast ReLU to the probabilistic prowess of the softmax.
Moez Ali's photo

Moez Ali

11 min

See MoreSee More