Loved by learners at thousands of companies
Course Description
It's commonly said that data scientists spend 80% of their time cleaning and manipulating data and only 20% of their time actually analyzing it. For this reason, it is critical to become familiar with the data cleaning process and all of the tools available to you along the way. This course provides a very basic introduction to cleaning data in R using the tidyr, dplyr, and stringr packages. After taking the course you'll be able to go from raw data to awesome insights as quickly and painlessly as possible!
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.- 1
Introduction and exploring raw data
FreeThis chapter will give you an overview of the process of data cleaning with R, then walk you through the basics of exploring raw data.
Introduction to Cleaning Data in R50 xpThe data cleaning process50 xpHere's what messy data look like100 xpHere's what clean data look like100 xpExploring raw data50 xpGetting a feel for your data100 xpViewing the structure of your data100 xpExploring raw data (part 2)50 xpLooking at your data100 xpVisualizing your data100 xp - 2
Tidying data
This chapter will give you an overview of the principles of tidy data, how to identify messy data, and what to do about it.
Introduction to tidy data50 xpPrinciples of tidy data50 xpCommon symptoms of messy data50 xpIntroduction to tidyr50 xpWhat kind of messy are the BMI data?50 xpGathering columns into key-value pairs100 xpSpreading key-value pairs into columns100 xpIntroduction to tidyr (part 2)50 xpFunctions in tidyr50 xpSeparating columns100 xpUniting columns100 xpColumn headers are values, not variable names100 xpVariables are stored in both rows and columns100 xpMultiple values are stored in one column100 xp - 3
Preparing data for analysis
This chapter will teach you how to prepare your data for analysis. We will look at type conversion, string manipulation, missing and special values, and outliers and obvious errors.
Type conversions50 xpTypes of variables in R100 xpCommon type conversions100 xpWorking with dates100 xpString manipulation50 xpTrimming and padding strings100 xpUpper and lower case100 xpFinding and replacing strings100 xpMissing and special values50 xpTypes of missing and special values in R50 xpFinding missing values100 xpDealing with missing values100 xpOutliers and obvious errors50 xpIdentifying outliers and obvious errors50 xpDealing with outliers and obvious errors100 xpAnother look at strange values100 xp - 4
Putting it all together
In this chapter, you will practice everything you've learned from the first three chapters in order to clean a messy dataset using R.
Time to put it all together!50 xpGet a feel for the data100 xpSummarize the data100 xpTake a closer look100 xpLet's tidy the data50 xpColumn names are values100 xpValues are variable names100 xpPrepare the data for analysis50 xpClean up dates100 xpA closer look at column types100 xpColumn type conversions100 xpMissing, extreme, and unexpected values50 xpFind missing values100 xpAn obvious error100 xpAnother obvious error100 xpCheck other extreme values100 xpFinishing touches100 xpYour data are clean!50 xp
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.prerequisites
Introduction to RNick Carchedi
See MoreProduct Manager at DataCamp
Join over 15 million learners and start Cleaning Data in R today!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.