Skip to main content

Course

Joining Data with dplyr

BasicSkill Level

4.7+

Updated 05/2023

Learn to combine data across multiple tables to answer more complex questions with dplyr.

Start Course for Free

RData Manipulation

4 hr

13 videos

49 Exercises

4,200 XP

81,662

Statement of Accomplishment

Loved by learners at thousands of companies

Training a Team?

Try for Business

Course Description

Often in data science, you'll encounter fascinating data that is spread across multiple tables. This course will teach you the skills you'll need to join multiple tables together to analyze them in combination. You'll practice your skills using a fun dataset about LEGOs from the Rebrickable website. The dataset contains information about the sets, parts, themes, and colors of LEGOs, but is spread across many tables. You'll work with the data throughout the course as you learn a total of six different joins! You'll learn four mutating joins: inner join, left join, right join, and full join, and two filtering joins: semi join and anti join. In the final chapter, you'll apply your new skills to Stack Overflow data, containing each of the almost 300,000 Stack Oveflow questions that are tagged with R, including information about their answers, the date they were asked, and their score. Get ready to take your dplyr skills to the next level!

Prerequisites

Data Manipulation with dplyr

1

Joining Tables

Get started with your first joining verb: inner-join! You'll learn to join tables together to answer questions about the LEGO dataset, which contains information across many tables about the sets, parts, themes, and colors of LEGOs over time.

The inner_join verb

What columns would you join on?

Joining parts and part categories

Joining with a one-to-many relationship

Joining parts and inventories

Joining in either direction

Joining three or more tables

Joining three tables

What's the most common color?

2

Left and Right Joins

Learn two more mutating joins, the left and right join, which are mirror images of each other! You'll learn use cases for each type of join as you explore parts and colors of LEGO themes. Then, you'll explore how to join tables to themselves to understand the hierarchy of LEGO themes in the data.

The left_join verb

Left joining two sets by part and color

Left joining two sets by color

Finding an observation that doesn't have a match

The right_join verb

Which join is best?

Counting part colors

Cleaning up your count

Joining tables to themselves

Joining themes to their children

Joining themes to their grandchildren

Left joining a table to itself

3

Full, Semi, and Anti Joins

In this chapter, you'll cover three more joining verbs: full-join, semi-join, and anti-join. You'll then use these verbs to answer questions about the similarities and differences between a variety of LEGO sets.

The full_join verb

Differences between Batman and Star Wars

Aggregating each theme

Full joining Batman and Star Wars LEGO parts

Comparing Batman and Star Wars LEGO parts

The semi_join and anti_join verbs

Select the join

Something within one set but not another

What colors are included in at least one set?

Which set is missing version 1?

Visualizing set differences

Aggregating sets to look at their differences

Combining sets

Visualizing the difference: Batman and Star Wars

4

Case Study: Joins on Stack Overflow Data

Put together all the types of join you learned in this course to analyze a new dataset: Stack Overflow questions, answers, and tags. This includes calculating and visualizing trends for some notable tags like dplyr and ggplot2. You'll also master one more method for combining tables, the bind_rows verb, which stacks tables on top of each other.

Stack Overflow questions

Left joining questions and tags

Comparing scores across tags

What tags never appear on R questions?

Joining questions and answers

Finding gaps between questions and answers

Joining question and answer counts

Joining questions, answers, and tags

Average answers by question

The bind_rows verb

Joining questions and answers with tags

Binding and counting posts with tags

Visualizing questions and answers in tags

Congratulations!

Joining Data with dplyr

Course
Complete

Earn Statement of Accomplishment

Add this credential to your LinkedIn profile, resume, or CV
Share it on social media and in your performance reviewEnroll Now

Don’t just take our word for it

*4.7

from 1,211 reviews

80%

18%

2%

0%

0%

Sort by

Na

2 hours ago

Robert

2 hours ago

Lucy

5 hours ago

Venkatesh

12 hours ago

Kacper

yesterday

Nhial

3 days ago

Na

Robert

Lucy

FAQs

What types of joins does this course teach?

You will learn six joins total: four mutating joins (inner, left, right, and full) and two filtering joins (semi and anti).

What datasets are used in this course?

You work with a LEGO dataset from Rebrickable covering sets, parts, themes, and colors, and with Stack Overflow data on nearly 300,000 R-tagged questions.

What prior R knowledge do I need?

You should be familiar with dplyr for data manipulation and the tidyverse. Prior completion of Data Manipulation with dplyr is recommended.

When would I use filtering joins like semi join and anti join?

Semi joins keep rows that have a match in another table, and anti joins keep rows that do not. They are useful for filtering data without adding extra columns.

How long does this course take to complete?

The course has 4 chapters and 49 exercises. Most learners finish it in about 3.5 hours.

Join over 19 million learners and start Joining Data with dplyr today!

Grow your data skills with DataCamp for Mobile

Make progress on the go with our mobile courses and daily 5-minute coding challenges.