Loved by learners at thousands of companies
Once you've started learning tools for data manipulation and visualization like dplyr and ggplot2, this course gives you a chance to use them in action on a real dataset. You'll explore the historical voting of the United Nations General Assembly, including analyzing differences in voting between countries, across time, and among international issues. In the process you'll gain more practice with the dplyr and ggplot2 packages, learn about the broom package for tidying model output, and experience the kind of start-to-finish exploratory analysis common in data science.
Data cleaning and summarizing with dplyrFree
The best way to learn data wrangling skills is to apply them to a specific case study. Here you'll learn how to clean and filter the United Nations voting dataset using the dplyr package, and how to summarize it into smaller, interpretable units.The United Nations Voting Dataset50 xpFiltering rows100 xpAdding a year column100 xpAdding a country column100 xpGrouping and summarizing50 xpSummarizing the full dataset100 xpSummarizing by year100 xpSummarizing by country100 xpSorting and filtering summarized data50 xpSorting by percentage of "yes" votes100 xpFiltering summarized output100 xp
Data visualization with ggplot2
Once you've cleaned and summarized data, you'll want to visualize them to understand trends and extract insights. Here you'll use the ggplot2 package to explore trends in United Nations voting within each country over time.Visualization with ggplot250 xpChoosing an aesthetic50 xpPlotting a line over time100 xpOther ggplot2 layers100 xpVisualizing by country50 xpSummarizing by year and country100 xpPlotting just the UK over time100 xpPlotting multiple countries100 xpFaceting by country50 xpFaceting the time series100 xpFaceting with free y-axis100 xpChoose your own countries100 xp
Tidy modeling with broom
While visualization helps you understand one country at a time, statistical modeling lets you quantify trends across many countries and interpret them together. Here you'll learn to use the tidyr, purrr, and broom packages to fit linear models to each country, and understand and compare their outputs.Linear regression50 xpLinear regression on the United States100 xpFinding the slope of a linear regression50 xpFinding the p-value of a linear regression50 xpTidying models with broom50 xpTidying a linear regression model100 xpCombining models for multiple countries100 xpNesting for multiple models50 xpNesting a data frame100 xpList columns100 xpUnnesting100 xpFitting multiple models50 xpPerforming linear regression on each nested dataset100 xpTidy each linear regression model100 xpUnnesting a data frame100 xpWorking with many tidy models50 xpFiltering model terms100 xpFiltering for significant countries100 xpSorting by slope100 xp
Joining and tidying
In this chapter, you'll learn to combine multiple related datasets, such as incorporating information about each resolution's topic into your vote analysis. You'll also learn how to turn untidy data into tidy data, and see how tidy data can guide your exploration of topics and countries over time.Joining datasets50 xpJoining datasets with inner_join100 xpFiltering the joined dataset100 xpVisualizing colonialism votes100 xpTidy data50 xpTidy data observations50 xpUsing gather to tidy a dataset100 xpRecoding the topics100 xpSummarize by country, year, and topic100 xpVisualizing trends in topics for one country100 xpTidy modeling by topic and country50 xpNesting by topic and country100 xpInterpreting tidy models100 xpSteepest trends by topic50 xpChecking models visually100 xpConclusion50 xp
PrerequisitesIntroduction to Data Visualization with ggplot2
Principal Data Scientist at Heap
What do other learners have to say?
I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.
Devon Edwards Joseph
Lloyds Banking Group
DataCamp is the top resource I recommend for learning data science.
Harvard Business School
DataCamp is by far my favorite website to learn from.
Decision Science Analytics, USAA