Home RCleaning Data in R

Cleaning Data in R

4.4+

25 reviews

Intermediate

Learn to clean data as quickly and accurately as possible to help your business move from raw data to awesome insights.

Start Course for Free

4 Hours13 Videos44 Exercises

47,803 LearnersStatement of Accomplishment

Create Your Free Account

Google LinkedIn Facebook

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Training 2 or more people?Try DataCamp For Business

Loved by learners at thousands of companies

Course Description

Overcome Common Data Problems Like Removing Duplicates in R

It's commonly said that data scientists spend 80% of their time cleaning and manipulating data and only 20% of their time analyzing it. The time spent cleaning is vital since analyzing dirty data can lead you to draw inaccurate conclusions.

In this course, you’ll learn a variety of techniques to help you clean dirty data using R. You’ll start by converting data types, applying range constraints, and dealing with full and partial duplicates to avoid double-counting.

Delve into Advanced Data Challenges

Once you’ve practiced working on common data issues, you’ll move on to more advanced challenges such as ensuring consistency in measurements and dealing with missing data. After every new concept, you’ll have the chance to complete a hands-on exercise to cement your knowledge and build your experience.

Learn to Use Record Linkage During Data Cleaning

Record Linkage is used to merge datasets together when the values have issues such as typos or different spellings. You’ll explore this useful technique in the final chapter and practice the application by using it to join two restaurant review datasets together into a single dataset.

For Business

Training 2 or more people?

Get your team access to the full DataCamp library, with centralized reporting, assignments, projects and more

In the following Tracks

Certification Available

Associate Data Scientist in R

Go To Track

Importing & Cleaning Data with R

Go To Track

1
Common Data Problems
Free
In this chapter, you'll learn how to overcome some of the most common dirty data problems. You'll convert data types, apply range constraints to remove future data points, and remove duplicated data points to avoid double-counting.
Play Chapter Now
Data type constraints
50 xp
Common data types
100 xp
Converting data types
100 xp
Trimming strings
100 xp
Range constraints
50 xp
Ride duration constraints
100 xp
Back to the future
100 xp
Uniqueness constraints
50 xp
Full duplicates
100 xp
Removing partial duplicates
100 xp
Aggregating partial duplicates
100 xp
2
Categorical and Text Data
Categorical and text data can often be some of the messiest parts of a dataset due to their unstructured nature. In this chapter, you’ll learn how to fix whitespace and capitalization inconsistencies in category labels, collapse multiple categories into one, and reformat strings for consistency.
Play Chapter Now
Checking membership
50 xp
Members only
100 xp
Not a member
100 xp
Categorical data problems
50 xp
Identifying inconsistency
100 xp
Correcting inconsistency
100 xp
Collapsing categories
100 xp
Cleaning text data
50 xp
Detecting inconsistent text data
100 xp
Replacing and removing
100 xp
Invalid phone numbers
100 xp
3
Advanced Data Problems
In this chapter, you’ll dive into more advanced data cleaning problems, such as ensuring that weights are all written in kilograms instead of pounds. You’ll also gain invaluable skills that will help you verify that values have been added correctly and that missing values don’t negatively impact your analyses.
Play Chapter Now
Uniformity
50 xp
Date uniformity
100 xp
Currency uniformity
100 xp
Cross field validation
50 xp
Validating totals
100 xp
Validating age
100 xp
Completeness
50 xp
Types of missingness
100 xp
Visualizing missing data
100 xp
Treating missing data
100 xp
4
Record Linkage
Record linkage is a powerful technique used to merge multiple datasets together, used when values have typos or different spellings. In this chapter, you'll learn how to link records by calculating the similarity between strings—you’ll then use your new skills to join two restaurant review datasets into one clean master dataset.
Play Chapter Now
Comparing strings
50 xp
Calculating distance
50 xp
Small distance, small difference
100 xp
Fixing typos with string distance
100 xp
Generating and comparing pairs
50 xp
Link or join?
100 xp
Pair blocking
100 xp
Comparing pairs
100 xp
Scoring and linking
50 xp
Score then select or select then score?
100 xp
Putting it together
100 xp
Congratulations!
50 xp

For Business

Training 2 or more people?

Get your team access to the full DataCamp library, with centralized reporting, assignments, projects and more

In the following Tracks

Certification Available

Associate Data Scientist in R

Go To Track

Importing & Cleaning Data with R

Go To Track

Datasets

Zagat Fodor's Bike Sharing SFO Satisfaction Survey Customer Accounts

Collaborators

Richie Cotton

Adel Nehme

Amy Peterson

Prerequisites

Joining Data with dplyr

Maggie Matsui

Curriculum Manager at DataCamp

Maggie is a Curriculum Manager at DataCamp. She holds a Bachelor's degree in Statistics and Computer Science from Brown University, where she spent lots of time teaching math, programming, and statistics as a tutor and teaching assistant. She's passionate about teaching all things data-related and making programming accessible to everyone.

Don’t just take our word for it

*4.4

from 25 reviews

64%

24%

Sort by

Highest to Lowest
Lowest to Highest
Most recent
Top reviews

John G.

9 months

This course was great. It was informative with an excellent instructor who clearly explained the information.

Tara P.

10 months

There were some really nice ideas on here and it was very helpful. I think using the assertive package is not necessary though and would like to see this updated to more base functions and ideas.

Daniel M.

12 months

Great course, very useful content, I will retake it for sure. I wished there was a second part though.

Euler A.

about 1 year

Very good material. Working with string is excellent.

Nicolas F.

about 1 year

This course was succinct,simple,and effective. I learned a ton in a short period of time.

"This course was great. It was informative with an excellent instructor who clearly explained the information."

John G.

"There were some really nice ideas on here and it was very helpful. I think using the assertive package is not necessary though and would like to see this updated to more base functions and ideas."

Tara P.

"Great course, very useful content, I will retake it for sure. I wished there was a second part though."

Daniel M.

FAQs

Join over 13 million learners and start Cleaning Data in R today!

Create Your Free Account

Google LinkedIn Facebook

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Cleaning Data in R

Create Your Free Account

Loved by learners at thousands of companies

Course Description

Overcome Common Data Problems Like Removing Duplicates in R

Delve into Advanced Data Challenges

Learn to Use Record Linkage During Data Cleaning

Training 2 or more people?

In the following Tracks

Associate Data Scientist in R

Importing & Cleaning Data with R

Common Data Problems

Categorical and Text Data

Advanced Data Problems

Record Linkage

Training 2 or more people?

In the following Tracks

Associate Data Scientist in R

Importing & Cleaning Data with R

Don’t just take our word for it

FAQs

Why is data cleaning important?

What is record linkage?

Who needs to learn how to clean data?

Is this course suitable for beginners?

Join over 13 million learners and start Cleaning Data in R today!

Create Your Free Account

Course Description

Overcome Common Data Problems Like Removing Duplicates in R

Delve into Advanced Data Challenges

Learn to Use Record Linkage During Data Cleaning

.css-1goj2uy{margin-right:8px;}Group.css-gnv7tt{font-size:20px;font-weight:700;white-space:nowrap;}.css-12nwtlk{box-sizing:border-box;margin:0;min-width:0;color:#05192D;font-size:16px;line-height:1.5;font-size:20px;font-weight:700;white-space:nowrap;}Training 2 or more people?

In the following Tracks

Associate Data Scientist in R

Importing & Cleaning Data with R

Common Data Problems

Categorical and Text Data

Advanced Data Problems

Record Linkage

GroupTraining 2 or more people?

In the following Tracks

Associate Data Scientist in R

Importing & Cleaning Data with R

Don’t just take our word for it

FAQs

Who needs to learn how to clean data?

Is this course suitable for beginners?

Join over .css-ou6dz6{color:#03ef62;}13 million learners and start Cleaning Data in R today!

Create Your Free Account

Training 2 or more people?

Training 2 or more people?

Join over 13 million learners and start Cleaning Data in R today!