Garrett is the author of Hands-On Programming with R and R for Data Science from O'Reilly Media. He is a Data Scientist at RStudio and holds a Ph.D. in Statistics, but specializes in teaching. He's taught people how to use R at over 50 government agencies, small businesses, and multi-billion dollar global companies; and he's designed RStudio's training materials for R, Shiny, dplyr and more and is a frequent contributor to the RStudio blog. He wrote the popular lubridate package for R.
This course builds on what you learned in Data Manipulation in R with dplyr by showing you how to combine data sets with dplyr's two table verbs. In the real world, data comes split across many data sets, but dplyr's core functions are designed to work with single tables of data. In this course, you'll learn the best ways to combine data sets into single tables. You'll learn how to augment columns from one data set with columns from another with mutating joins, how to filter one data set against another with filtering joins, and how to sift through data sets with set operations. Along the way, you'll discover the best practices for building data sets and troubleshooting joins with dplyr. Afterwards, you’ll be well on your way to data manipulation mastery!
Mutating joins add new variables to one dataset from another dataset, matching observations across rows in the process. This chapter will explain the various ways you can join datasets together and what happens when you do.
Filtering joins and set operations combine information from datasets without adding new variables. Filtering joins filter the observations of one dataset based on whether or not they occur in a second dataset. Set operations use combinations of observations from both datasets to create a new dataset.
This chapter will show you how to build datasets from basic elements: vectors, lists, and individual datasets that do not require a join. dplyr contains a set of functions for assembling data that work more intuitively than base R's functions. The chapter will also look at when dplyr does and does not use data type coercion.
Now that you have the basics, let's dive deep into the mechanics of joins. This chapter will show you how to spot common join problems, how to join based on multiple or mismatched keys, how to join multiple tables, and how to recreate dplyr's joins with SQL and base R.
You know the ins and outs of two-table verbs with dplyr, but your knowledge is untried! Let's cement what you've learned with a real world application.