Skip to main content

PyTorch 2.0 is Here: Everything We Know

Explore the latest release of PyTorch, which is faster, more Pythonic, and more dynamic.
May 2023  · 6 min read

PyTorch is an open-source and community-led deep learning framework that provides a flexible and efficient way of building machine learning models. It has a user-friendly interface, extensive community support, and seamless integration with the Python ecosystem.

PyTorch 2.0 has introduced fundamental changes to the core compiler operations while preserving the same level of familiarity and ease for developers. This latest enhancement promises accelerated performance and expanded support for Dynamic Shapes and Distributed.

What is New in PyTorch 2.0?

PyTorch is moving parts from C++ back into Python, making it faster and hackable. With version 2.0, they have introduced `torch.compile`, which has changed how PyTorch operates at the compiler level. This feature is optional and does not affect your old code.

PyTorch 2.0 Compile

To provide a solid foundation for the `torch.compile` feature, new technologies were introduced:

  • TorchDynamo. A Python-level Just-in-Time (JIT) compiler specifically designed to accelerate PyTorch. By integrating with the frame evaluation API in CPython, it dynamically modifies Python bytecode at runtime, enabling faster execution of code.
  • AOTAutograd. A toolkit for assisting developers in accelerating model training on PyTorch. It traces both the forward and backward graphs in advance. Moreover, AOTAutograd offers simple mechanisms to compile the extracted graphs seamlessly using cutting-edge deep-learning compilers.
  • PrimTorch. By significantly reducing the number of PyTorch operators from over 2000 to a concise set of approximately 250 primitive operators, PrimTorch has remarkably simplified the process of developing PyTorch features or backends.
  • TorchInductor. A PyTorch-native deep learning compiler that automatically maps PyTorch models into generated code for multiple accelerators and backends. TorchInductor uses OpenAI Triton as a building block for GPU acceleration.

All of the new technologies are written in Python and support Dynamic shapes. It makes new PyTorch run code faster, more flexible, and easily hackable, lowering the barrier of entry.

Code Examples

Let us review the quick and easy code implementation of PyTorch Compiler.

Without torch.compile

import torch
model = torch.hub.load("pytorch/vision", "resnet50", weights="IMAGENET1K_V2")

Without torch.compile

To boost model performance, just add the torch.compile wrapper around the model and get a compiled model. It is plug-and-play.

import torch
model = torch.hub.load("pytorch/vision", "resnet50", weights="IMAGENET1K_V2")
compiled_model = torch.compile(model)

Learn to build deep learning models with the PyTorch library by taking Deep Learning with PyTorch course.

You can simply train your model without changing a thing.

import torch
model = torch.compile(model)
for batch in dataloader:
    run_epoch(model, batch)

Or you can run mode inference.

model = torch.compile(model)

The torch.compile() come with additional parameters:

  1. mode: you can specify what compiler should optimize while compiling.
  2. dynamic: it is used to enable the code path for Dynamic Shapes.
  3. fullgraph: it compiles the program into a single graph.
  4. backend: by default it uses TorchInductor, but you can specify other available compiler backends.
def torch.compile(model: Callable,
  mode: Optional[str] = "default",
  dynamic: bool = False,
  fullgraph:bool = False,
  backend: Union[str, Callable] = "inductor",
) -> torch._dynamo.NNOptimizedModule

Learn the basics of PyTorch API and create a simple neural network from scratch using our PyTorch Tutorial.


In getting started with PyTorch 2.0, developers have used 163 open-source (46 HuggingFace Transformers, 61 TIMM, and 56 TorchBench) models for creating performance benchmarks of a new compile feature. The Benchmark includes tasks such as benchmark carefully to include tasks such as Image classification, Image generation, Language modeling, Recommender systems, and Reinforcement learning.

The result shows significantly improved performance while training on NVIDIA A100 GPUs.

Note: currently, the default backend only supports CPUs and Nvidia Volta and Ampere GPUs series.

PyTorch compiler benchmark on NVIDIA A100 GPU

PyTorch compiler benchmark on NVIDIA A100 GPU

It is a start, and you will see in the upcoming update more performance and scalability enhancements.

How to Install PyTorch 2.0

You can simply install a newer version of PyTorch by using pip.

Copy and paste the below command into your terminal.

For GPUs: CUDA 11.8

It turns out that the newer version of GPUs have shown drastically better performance.

pip3 install numpy --pre torch torchvision torchaudio --force-reinstall --index-url

For GPUs: CUDA 11.7

pip3 install numpy --pre torch torchvision torchaudio --force-reinstall --index-url

For CPUs:

pip3 install numpy --pre torch torchvision torchaudio --force-reinstall --index-url

For Verification:

git clone
cd tools/dynamo

Accelerating Hugging Face with PyTorch 2.0

Let’s try the torch.compile function to accelerate Hugging Face transformers. You can make your Hugging Face code run faster with a single-line decorator.

Note: With torch.compile(), we have seen a 30%-200% performance boost on training - TorchDynamo Performance Dashboard.

In this example, we will apply torch.compile on a “dolly-v2-3b" large language model for faster inference. To run the code in Google Colab, we have to first install the necessary Python libraries.

%pip install transformers accelerate xformers

Then, we will download and load the tokenizer and language model using Hugging Face transformers. After that, we will add nn.Module to the torch.compile() function.

import torch
from transformers import (

tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v2-3b")
model = AutoModelForCausalLM.from_pretrained(
    "databricks/dolly-v2-3b", device_map="auto", torch_dtype=torch.bfloat16
model = torch.compile(model) #only line of code is required

In the final step, we will convert text into tokens using tokenizer, feed it to model.generate, and then decode the generated output into text using tokenizer.batch_decode.

prompt = "I love you because"
inputs = tokenizer(prompt, return_tensors="pt").to(device="cuda:0")

# Generate
generate_ids = model.generate(inputs.input_ids, max_length=50)
    generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True

As we can see, the “dolly-v2” has completed the sentence by adding, “you are a good person…”

"I love you because you are a good person.
You are kind, you help others, you are honest, you are loyal, you are humble, you are humble, you are humble.
You are a good person."

It also works with the Hugging Face pipeline. Just provided a task type, model, and tokenizer.

generator = pipeline("text-generation", model= model,tokenizer=tokenizer)
generator("What is the name of Germany's Capital?")


[{'generated_text': "What is the name of Germany's Capital? 
The name of Germany's Capital is Berlin."}]

The compile function works out of the box with transformers, accelerate, and TIMM Python libraries.

If you're uncertain about where to begin your deep learning and AI career, the Machine Learning Scientist with Python career track is an excellent starting point. It provides a comprehensive overview of crucial Python skills that can help you secure a job as a machine learning scientist.

Photo of Abid Ali Awan
Abid Ali Awan

I am a certified data scientist who enjoys building machine learning applications and writing blogs on data science. I am currently focusing on content creation, editing, and working with large language models.


Pandas 2.0: What’s New and Top Tips

Dive into pandas 2.0, the latest update of the essential data analysis library, with new features like PyArrow integration, nullable data types, and non-nanosecond datetime resolution for better performance and efficiency.
Moez Ali's photo

Moez Ali

9 min

Python Plotly Express Tutorial: Unlock Beautiful Visualizations

Learn how to create highly interactive and visually appealing charts with Python Plotly Express.
Bekhruz Tuychiev's photo

Bekhruz Tuychiev

10 min

An Introduction to Python T-Tests

Learn how to perform t-tests in Python with this tutorial. Understand the different types of t-tests - one-sample test, two-sample test, paired t-test, and Welch’s test, and when to use them.
Vidhi Chugh's photo

Vidhi Chugh

13 min

Matplotlib time series line plot

This tutorial explores how to create and customize time series line plots in matplotlib.
Elena Kosourova's photo

Elena Kosourova

8 min

Step-by-Step Guide to Making Map in Python using Plotly Library

Make your data stand out with stunning maps created with Plotly in Python
Moez Ali's photo

Moez Ali

7 min

High Performance Data Manipulation in Python: pandas 2.0 vs. polars

Discover the main differences between Python’s pandas and polars libraries for data science
Javier Canales Luna's photo

Javier Canales Luna

16 min

See MoreSee More