Loved by learners at thousands of companies
Often in data science, you'll encounter fascinating data that is spread across multiple tables. This course will teach you the skills you'll need to join multiple tables together to analyze them in combination. You'll practice your skills using a fun dataset about LEGOs from the Rebrickable website. The dataset contains information about the sets, parts, themes, and colors of LEGOs, but is spread across many tables. You'll work with the data throughout the course as you learn a total of six different joins! You'll learn four mutating joins: inner join, left join, right join, and full join, and two filtering joins: semi join and anti join. In the final chapter, you'll apply your new skills to Stack Overflow data, containing each of the almost 300,000 Stack Oveflow questions that are tagged with R, including information about their answers, the date they were asked, and their score. Get ready to take your dplyr skills to the next level!
Get started with your first joining verb: inner-join! You'll learn to join tables together to answer questions about the LEGO dataset, which contains information across many tables about the sets, parts, themes, and colors of LEGOs over time.The inner_join verb50 xpWhat columns would you join on?50 xpJoining parts and part categories100 xpJoining with a one-to-many relationship50 xpJoining parts and inventories100 xpJoining in either direction100 xpJoining three or more tables50 xpJoining three tables100 xpWhat's the most common color?100 xp
Left and Right Joins
Learn two more mutating joins, the left and right join, which are mirror images of each other! You'll learn use cases for each type of join as you explore parts and colors of LEGO themes. Then, you'll explore how to join tables to themselves to understand the hierarchy of LEGO themes in the data.The left_join verb50 xpLeft joining two sets by part and color100 xpLeft joining two sets by color100 xpFinding an observation that doesn't have a match100 xpThe right-join verb50 xpWhich join is best?100 xpCounting part colors100 xpCleaning up your count100 xpJoining tables to themselves50 xpJoining themes to their children100 xpJoining themes to their grandchildren100 xpLeft-joining a table to itself100 xp
Full, Semi, and Anti Joins
In this chapter, you'll cover three more joining verbs: full-join, semi-join, and anti-join. You'll then use these verbs to answer questions about the similarities and differences between a variety of LEGO sets.The full_join verb50 xpDifferences between Batman and Star Wars100 xpAggregating each theme100 xpFull-joining Batman and Star Wars LEGO parts100 xpComparing Batman and Star Wars LEGO parts100 xpThe semi- and anti-join verbs50 xpSelect the join100 xpSomething within one set but not another100 xpWhat colors are included in at least one set?100 xpWhich set is missing version 1?100 xpVisualizing set differences50 xpAggregating sets to look at their differences100 xpCombining sets100 xpVisualizing the difference: Batman and Star Wars100 xp
Case Study: Joins on Stack Overflow Data
Put together all the types of join you learned in this course to analyze a new dataset: Stack Overflow questions, answers, and tags. This includes calculating and visualizing trends for some notable tags like dplyr and ggplot2. You'll also master one more method for combining tables, the bind_rows verb, which stacks tables on top of each other.Stack Overflow questions50 xpLeft-joining questions and tags100 xpComparing scores across tags100 xpWhat tags never appear on R questions?100 xpJoining questions and answers50 xpFinding gaps between questions and answers100 xpJoining question and answer counts100 xpJoining questions, answers, and tags100 xpAverage answers by question100 xpThe bind_rows verb50 xpJoining questions and answers with tags100 xpBinding and counting posts with tags100 xpVisualizing questions and answers in tags100 xpCongratulations!50 xp
PrerequisitesData Manipulation with dplyr
DataCamp Content Creator
DataCamp offers interactive R, Python, Spreadsheets, SQL and shell courses. All on topics in data science, statistics, and machine learning. Learn from a team of expert teachers in the comfort of your browser with video lessons and fun coding challenges and projects.
What do other learners have to say?
I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.
Devon Edwards Joseph
Lloyds Banking Group
DataCamp is the top resource I recommend for learning data science.
Harvard Business School
DataCamp is by far my favorite website to learn from.
Decision Science Analytics, USAA