Plamena Zhekova has completed
Case Study: Exploratory Data Analysis in R
Start course For Free4 hr
4,800 XP

Loved by learners at thousands of companies
Course Description
Once you've started learning tools for data manipulation and visualization like dplyr and ggplot2, this course gives you a chance to use them in action on a real dataset. You'll explore the historical voting of the United Nations General Assembly, including analyzing differences in voting between countries, across time, and among international issues. In the process you'll gain more practice with the dplyr and ggplot2 packages, learn about the broom package for tidying model output, and experience the kind of start-to-finish exploratory analysis common in data science.
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.- 1Data cleaning and summarizing with dplyrFreeThe best way to learn data wrangling skills is to apply them to a specific case study. Here you'll learn how to clean and filter the United Nations voting dataset using the dplyr package, and how to summarize it into smaller, interpretable units. The United Nations Voting Dataset50 xpFiltering rows100 xpAdding a year column100 xpAdding a country column100 xpGrouping and summarizing50 xpSummarizing the full dataset100 xpSummarizing by year100 xpSummarizing by country100 xpSorting and filtering summarized data50 xpSorting by percentage of "yes" votes100 xpFiltering summarized output100 xp
- 2Data visualization with ggplot2Once you've cleaned and summarized data, you'll want to visualize them to understand trends and extract insights. Here you'll use the ggplot2 package to explore trends in United Nations voting within each country over time. Visualization with ggplot250 xpChoosing an aesthetic50 xpPlotting a line over time100 xpOther ggplot2 layers100 xpVisualizing by country50 xpSummarizing by year and country100 xpPlotting just the UK over time100 xpPlotting multiple countries100 xpFaceting by country50 xpFaceting the time series100 xpFaceting with free y-axis100 xpChoose your own countries100 xp
- 3Tidy modeling with broomWhile visualization helps you understand one country at a time, statistical modeling lets you quantify trends across many countries and interpret them together. Here you'll learn to use the tidyr, purrr, and broom packages to fit linear models to each country, and understand and compare their outputs. Linear regression50 xpLinear regression on the United States100 xpFinding the slope of a linear regression50 xpFinding the p-value of a linear regression50 xpTidying models with broom50 xpTidying a linear regression model100 xpCombining models for multiple countries100 xpNesting for multiple models50 xpNesting a data frame100 xpList columns100 xpUnnesting100 xpFitting multiple models50 xpPerforming linear regression on each nested dataset100 xpTidy each linear regression model100 xpUnnesting a data frame100 xpWorking with many tidy models50 xpFiltering model terms100 xpFiltering for significant countries100 xpSorting by slope100 xp
- 4Joining and tidyingIn this chapter, you'll learn to combine multiple related datasets, such as incorporating information about each resolution's topic into your vote analysis. You'll also learn how to turn untidy data into tidy data, and see how tidy data can guide your exploration of topics and countries over time. Joining datasets50 xpJoining datasets with inner_join100 xpFiltering the joined dataset100 xpVisualizing colonialism votes100 xpTidy data50 xpTidy data observations50 xpUsing gather to tidy a dataset100 xpRecoding the topics100 xpSummarize by country, year, and topic100 xpVisualizing trends in topics for one country100 xpTidy modeling by topic and country50 xpNesting by topic and country100 xpInterpreting tidy models100 xpSteepest trends by topic50 xpChecking models visually100 xpConclusion50 xp
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.collaborators


prerequisites
Introduction to Data Visualization with ggplot2 David Robinson
David RobinsonPrincipal Data Scientist at Heap
Dave is the Principal Data Scientist at Heap. He has worked as a data scientist at DataCamp and Stack Overflow, and received his PhD in Quantitative and Computational Biology from Princeton University. Follow him at @drob on Twitter or on his blog, Variance Explained.
Join over 18 million learners and start Case Study: Exploratory Data Analysis in R today!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.