Cleaning Data with Pyspark

Join us for this hands-on training where you will learn how to utilize the power of Python and Apache Spark for cleaning data. We'll work through a dataset with a myriad of common issues you would likely encounter while preparing the data for further processing or analysis. This includes handling malformed and missing data, using transformations, and a bit about validating your datasets. This session runs for three hours, providing time to gain experience with Spark and data cleaning and will include short breaks and Q&A throughout.

  • Slides 

  • Session notebook

  • Solution notebook

Mike Metzger Headshot
Mike Metzger

Data Engineer Consultant at Flexible Creations

Hands-on learning experience

Companies using DataCamp achieve course completion rates 6X higher than traditional online course providers

Learn More

Upskill your teams in data science and analytics

Learn More

Join 2,500+ companies and 80% of the Fortune 1000 who use DataCamp to upskill their teams.

Don’t just take our word for it.