Loved by learners at thousands of companies
Course Description
This course builds on what you learned in Data Manipulation in R with dplyr by showing you how to combine data sets with dplyr's two table verbs. In the real world, data comes split across many data sets, but dplyr's core functions are designed to work with single tables of data. In this course, you'll learn the best ways to combine data sets into single tables. You'll learn how to augment columns from one data set with columns from another with mutating joins, how to filter one data set against another with filtering joins, and how to sift through data sets with set operations. Along the way, you'll discover the best practices for building data sets and troubleshooting joins with dplyr. Afterwards, you’ll be well on your way to data manipulation mastery!
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.- 1
Mutating joins
FreeMutating joins add new variables to one dataset from another dataset, matching observations across rows in the process. This chapter will explain the various ways you can join datasets together and what happens when you do.
Welcome to the course!50 xpThe advantages of dplyr50 xpKeys50 xpPrimary keys50 xpSecondary keys50 xpMulti-variable keys50 xpJoins50 xpA basic join100 xpA second join100 xpA right join100 xpVariations on joins50 xpInner joins and full joins100 xpPipes100 xpPractice with pipes and joins100 xpChoose your joins100 xp - 2
Filtering joins and set operations
Filtering joins and set operations combine information from datasets without adding new variables. Filtering joins filter the observations of one dataset based on whether or not they occur in a second dataset. Set operations use combinations of observations from both datasets to create a new dataset.
Semi-joins50 xpApply a semi-join100 xpExploring with semi-joins100 xpA more precise way to filter?50 xpAnti-joins50 xpApply an anti-join100 xpApply another anti-join100 xpWhich filtering join?100 xpSet operations50 xpHow many songs are there?100 xpGreatest hits100 xpLive! Bootleg songs100 xpMultiple operations100 xpUnique values50 xpComparing datasets50 xpApply setequal100 xpApply setequal again100 xpComparing albums100 xp - 3
Assembling data
This chapter will show you how to build datasets from basic elements: vectors, lists, and individual datasets that do not require a join. dplyr contains a set of functions for assembling data that work more intuitively than base R's functions. The chapter will also look at when dplyr does and does not use data type coercion.
Binds50 xpDifferences between dplyr and base R50 xpWhich bind?100 xpBind rows100 xpBind columns100 xpDanger50 xpBuild a better data frame50 xpdata_frame50 xpMake a data frame100 xpLists of columns100 xpLists of rows (data frames)100 xpWorking with data types50 xpAtomic data types50 xpdplyr's coercion rules50 xpdplyr and coercion50 xpDetermining type50 xpResults100 xp - 4
Advanced joining
Now that you have the basics, let's dive deep into the mechanics of joins. This chapter will show you how to spot common join problems, how to join based on multiple or mismatched keys, how to join multiple tables, and how to recreate dplyr's joins with SQL and base R.
What can go wrong?50 xpSpot the key100 xpNon-unique keys50 xpTwo non-unique keys50 xpMissing keys100 xpDefining the keys50 xpWhich keys?50 xpA subset of keys100 xpMis-matched key names100 xpMore mismatched names100 xpJoining multiple tables50 xpMultiple joins50 xpJoin multiple tables100 xpFilter multiple tables100 xpOther implementations50 xpSQL50 xpBase R100 xp - 5
Case study
You know the ins and outs of two-table verbs with dplyr, but your knowledge is untried! Let's cement what you've learned with a real world application.
Lahman's Baseball Database50 xpUniversal keys?100 xpCommon keys100 xpplayerID100 xpSalaries50 xpWho are the players?100 xpMissing salaries100 xpUnpaid games?100 xpHow many games?100 xpHow many at-bats?100 xpIntroducing the hall of fame50 xpHall of fame nominations100 xpHall of fame inductions100 xpAwards100 xpSalary100 xpRetirement100 xpCongratulations!50 xp
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.datasets
AerosmithThe EaglesElvis PresleyHank WilliamsJimi HendrixJulie AndrewsMichael JacksonFrank Sinatra and Bing CrosbyMusicalsThe Dark Side of the Moon (Pink Floyd)Top selling albums in the USThe Complete Studio RecordingsThe Song Remains the SameThe Definitive CollectionLahman NamesLive! BootlegSupergroupscollaborators
Join over 15 million learners and start Joining Data in R with dplyr today!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.