Cleaning Data in Python Course

Name: Cleaning Data in Python
Rating: 4.796901172529314 (4776 reviews)

Cleaning Data in Python

IntermediateSkill Level

4.7+

4,776 reviews

Updated 12/2025

Learn to diagnose and treat dirty data and develop the skills needed to transform your raw data into accurate insights!

Course Description

Discover How to Clean Data in Python

It's commonly said that data scientists spend 80% of their time cleaning and manipulating data and only 20% of their time analyzing it. Data cleaning is an essential step for every data scientist, as analyzing dirty data can lead to inaccurate conclusions.

In this course, you will learn how to identify, diagnose, and treat various data cleaning problems in Python, ranging from simple to advanced. You will deal with improper data types, check that your data is in the correct range, handle missing data, perform record linkage, and more!

Learn How to Clean Different Data Types

The first chapter of the course explores common data problems and how you can fix them. You will first understand basic data types and how to deal with them individually. After, you'll apply range constraints and remove duplicated data points.

The last chapter explores record linkage, a powerful tool to merge multiple datasets. You'll learn how to link records by calculating the similarity between strings. Finally, you'll use your new skills to join two restaurant review datasets into one clean master dataset.

Gain Confidence in Cleaning Data

By the end of the course, you will gain the confidence to clean data from various types and use record linkage to merge multiple datasets. Cleaning data is an essential skill for data scientists. If you want to learn more about cleaning data in Python and its applications, check out the following tracks: Data Scientist with Python and Importing & Cleaning Data with Python.

What you'll learn

Assess data uniformity and integrity by applying unit conversions, cross-field validation, and assert statements
Differentiate strategies for handling missing data, such as deletion, statistical imputation, and encoding, based on the underlying pattern of missingness.
Distinguish between text, categorical, numerical, and date data problems and select appropriate pandas and NumPy cleaning functions for each
Evaluate string-matching metrics and record-linkage workflows to consolidate records with fuzzy duplicates
Identify common data quality issues including incorrect data types, range violations, duplicates, inconsistent categories, and missing values

Feels like what you want to learn?

Start Course for Free

Prerequisites

Python Toolbox Joining Data with pandas

Common data problems

In this chapter, you'll learn how to overcome some of the most common dirty data problems. You'll convert data types, apply range constraints to remove future data points, and remove duplicated data points to avoid double-counting.

Course Description

Discover How to Clean Data in Python

Learn How to Clean Different Data Types

Gain Confidence in Cleaning Data

What you'll learn

Feels like what you want to learn?

Earn Statement of Accomplishment

Don’t just take our word for it

FAQs

Is data cleaning easy to learn?

Will I receive a certificate at the end of the course?

Who will benefit from this course?

What topics does this course cover?

Join over .css-nklxlk{color:var(--wf-brand--main, #03EF62);}19 million learners and start Cleaning Data in Python today!

Create Your Free Account

Grow your data skills with DataCamp for Mobile

Join over 19 million learners and start Cleaning Data in Python today!