Skip to main content

Parallel Programming with Dask in Python

Learn to upscale your Python workflows to efficiently handle big data with Dask.

Start Course for Free
4 Hours15 Videos51 Exercises
4150 XP

Create Your Free Account

GoogleLinkedInFacebook

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA. You confirm you are at least 16 years old (13 if you are an authorized Classrooms user).

Loved by learners at thousands of companies


Course Description

When working with big data, you’ll face two common obstacles: using too much memory and long runtimes. The Dask library can lower your memory use by loading chunks of data only when needed. It can lower runtimes by using all your available computing cores in parallel. Best of all, it requires very few changes to your existing Python code. In this course, you use Dask to analyze Spotify song data, process images of sign language gestures, calculate trends in weather data, analyze audio recordings, and train machine learning models on big data.

  1. 1

    Lazy Evaluation and Parallel Computing

    Free

    This chapter will teach you the basics of Dask and lazy evaluation. At the end of this chapter, you'll be able to speed up almost any Python code by using parallel processing or multi-threading. You'll learn the difference between these two task scheduling methods and which one is better under which circumstances.

    Play Chapter Now
    Introduction to Dask
    50 xp
    Lazy evaluation
    50 xp
    Delaying functions
    100 xp
    Task graphs and scheduling methods
    50 xp
    What are the different schedulers?
    100 xp
    Plotting the task graph
    100 xp
    Building delayed pipelines
    50 xp
    Analyzing songs on Spotify
    100 xp
    How danceable are songs these days?
    100 xp
    Most popular songs
    100 xp
  2. 4

    Dask Machine Learning and Final Pieces

    Harness the power of Dask to train machine learning models. You'll learn how to train machine learning models on big data using the Dask-ML package, and how to split Dask calculations across a mixture of processes and threads for even greater computing speed.

    Play Chapter Now

Datasets

Spotify Songs - CSVSpotify Songs - ParquetEuropean Rainfall - HDF5European Rainfall - ZarrTripadvisor Hotel ReviewsPoliticians

Collaborators

James ChapmanAmy Peterson
James Fulton Headshot

James Fulton

Climate Informatics Researcher

James is a PhD researcher at the University of Edinburgh, where he tutors computing, machine learning, data analysis, and statistical physics. His research involves using and developing machine learning algorithms to extract space-time patterns from climate records and climate models. He has held visiting researcher roles, working on planet-scale data analysis and modeling, at the University of Oxford and Queen's University Belfast and has a masters in physics where he specialized in quantum simulation. In a previous life, he was employed as a data scientist in the insurance sector. When not several indents deep in Python, he performs improvised comedy.
See More

What do other learners have to say?

I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.

Devon Edwards Joseph
Lloyds Banking Group

DataCamp is the top resource I recommend for learning data science.

Louis Maiden
Harvard Business School

DataCamp is by far my favorite website to learn from.

Ronald Bowers
Decision Science Analytics, USAA