Raedy Ping has completed

Case Study: Exploratory Data Analysis in R

4 hours
4,800 XP

Course Description

Once you've started learning tools for data manipulation and visualization like dplyr and ggplot2, this course gives you a chance to use them in action on a real dataset. You'll explore the historical voting of the United Nations General Assembly, including analyzing differences in voting between countries, across time, and among international issues. In the process you'll gain more practice with the dplyr and ggplot2 packages, learn about the broom package for tidying model output, and experience the kind of start-to-finish exploratory analysis common in data science.

.css-1goj2uy{margin-right:8px;}Group.css-gnv7tt{font-size:20px;font-weight:700;white-space:nowrap;}.css-12nwtlk{box-sizing:border-box;margin:0;min-width:0;color:#05192D;font-size:16px;line-height:1.5;font-size:20px;font-weight:700;white-space:nowrap;}Training 2 or more people?

Try DataCamp for BusinessFor a bespoke solution book a demo.
1. 1

Data cleaning and summarizing with dplyr

Free

The best way to learn data wrangling skills is to apply them to a specific case study. Here you'll learn how to clean and filter the United Nations voting dataset using the dplyr package, and how to summarize it into smaller, interpretable units.

Play Chapter Now
The United Nations Voting Dataset
50 xp
Filtering rows
100 xp
100 xp
100 xp
Grouping and summarizing
50 xp
Summarizing the full dataset
100 xp
Summarizing by year
100 xp
Summarizing by country
100 xp
Sorting and filtering summarized data
50 xp
Sorting by percentage of "yes" votes
100 xp
Filtering summarized output
100 xp
2. 2

Data visualization with ggplot2

Once you've cleaned and summarized data, you'll want to visualize them to understand trends and extract insights. Here you'll use the ggplot2 package to explore trends in United Nations voting within each country over time.

3. 3

Tidy modeling with broom

While visualization helps you understand one country at a time, statistical modeling lets you quantify trends across many countries and interpret them together. Here you'll learn to use the tidyr, purrr, and broom packages to fit linear models to each country, and understand and compare their outputs.

4. 4

Joining and tidying

In this chapter, you'll learn to combine multiple related datasets, such as incorporating information about each resolution's topic into your vote analysis. You'll also learn how to turn untidy data into tidy data, and see how tidy data can guide your exploration of topics and countries over time.

In the following tracks

Data Manipulation

Collaborators

David Robinson

Principal Data Scientist at Heap

See More