Cleaning Data in R

Learn to explore your data so you can properly clean and prepare it for analysis.
Start Course for Free
4 Hours15 Videos58 Exercises117,340 Learners
4700 XP

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA. You confirm you are at least 16 years old (13 if you are an authorized Classrooms user).

Loved by learners at thousands of companies

Course Description

It's commonly said that data scientists spend 80% of their time cleaning and manipulating data and only 20% of their time actually analyzing it. For this reason, it is critical to become familiar with the data cleaning process and all of the tools available to you along the way. This course provides a very basic introduction to cleaning data in R using the tidyr, dplyr, and stringr packages. After taking the course you'll be able to go from raw data to awesome insights as quickly and painlessly as possible!

  1. 1

    Introduction and exploring raw data

    This chapter will give you an overview of the process of data cleaning with R, then walk you through the basics of exploring raw data.
    Play Chapter Now
  2. 2

    Tidying data

    This chapter will give you an overview of the principles of tidy data, how to identify messy data, and what to do about it.
    Play Chapter Now
  3. 3

    Preparing data for analysis

    This chapter will teach you how to prepare your data for analysis. We will look at type conversion, string manipulation, missing and special values, and outliers and obvious errors.
    Play Chapter Now
  4. 4

    Putting it all together

    In this chapter, you will practice everything you've learned from the first three chapters in order to clean a messy dataset using R.
    Play Chapter Now
Messy weather dataBMI dataCensus dataStudent data (with dates)
Introduction to R
Nick Carchedi Headshot

Nick Carchedi

Product Manager at DataCamp
Nick is a Product Manager at DataCamp. Prior to joining DataCamp, he earned his master's degree at Johns Hopkins Biostatistics and worked as a data scientist for McKinsey. Nick's passion for teaching data science began in graduate school, where he was heavily involved in tutoring fellow students, developing the Johns Hopkins Data Science Specialization, and building the swirl R package.
See More

What do other learners have to say?

I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.

Devon Edwards Joseph
Lloyds Banking Group

DataCamp is the top resource I recommend for learning data science.

Louis Maiden
Harvard Business School

DataCamp is by far my favorite website to learn from.

Ronald Bowers
Decision Science Analytics, USAA