Loved by learners at thousands of companies
This course will help you take your data visualization skills beyond the basics and hone them into a powerful member of your data science toolkit. Over the lessons we will use two interesting open datasets to cover different types of data (proportions, point-data, single distributions, and multiple distributions) and discuss the pros and cons of the most common visualizations. In addition, we will cover some less common alternatives visualizations for the data types and how to tweak default ggplot settings to most efficiently and effectively get your message across.
Proportions of a wholeFree
In this chapter, we focus on visualizing proportions of a whole; we see that pie charts really aren't so bad, along with discussing the waffle chart and stacked bars for comparing multiple proportions.Grammar of Graphics intro50 xpFamiliarizing with disease data100 xpWarming up data-wrangling100 xpThe pie chart and its friends50 xpThe infamous P-I-E100 xpCleaning up the pie100 xpHow about a waffle?100 xpWhen to use bars50 xpBasic stacked bars100 xpOrdering stack for readability100 xpCategorical x-axis100 xp
We shift our focus now to single-observation or point data and go over when bar charts are appropriate and when they are not, what to use when they are not, and general perception-based enhancements for your charts.Bars and dots: point data50 xpAre bars appropriate?50 xpWorking with geom_col100 xpWrangling geom_bar100 xpPoint charts50 xpOrdered point chart100 xpAdding visual anchors100 xpFaceting to show structure.100 xpTuning your charts50 xpLet's flip some axes100 xpCleaning up the bars100 xpConverting to point chart100 xp
We now move on to visualizing distributional data, we expose the fragility of histograms, discuss when it is better to shift to a kernel density plots, and how to make both plots work best for your data.Importance of distributions50 xpOrienting with the data100 xpLooking at all data100 xpChanging y-axis to density100 xpHistogram nuances50 xpAdjusting the bin numbers100 xpMore bars100 xpBin width by context100 xpThe kernel density estimator50 xpHistogram to KDE100 xpPutting a rug down100 xpKDE with lots of data100 xp
Finishing off we take a look at comparing multiple distributions to each other. We see why the traditional box plots are very dangerous and how to easily improve them, along with investigating when you should use more advanced alternatives like the beeswarm plot and violin plots.Intro to comparing distributions50 xpA simple boxplot100 xpAdding some jitter100 xpFaceting to show all colors100 xpBeeswarms and violins50 xpYour first beeswarm100 xpFiddling with a violin plot100 xpViolins with boxplots100 xpComparing lots of distributions100 xpComparing spatially-related distributions50 xpA basic ridgeline plot100 xpCleaning up your ridgelines100 xpMaking it rain (data points)100 xpWrap-up50 xp
In the following tracksData Visualization with R
PrerequisitesIntroduction to Data Visualization with ggplot2
Nicholas StrayerSee More
Biostatistician at Vanderbilt
I am currently biostatistician and data scientist at Vanderbilt University. My research focuses on the fusion of machine learning and data visualization to explore and explain electronic health records data. I have worked as a data journalist in the graphics department at the New York Times, a data scientist at Johns Hopkins University Data Science Lab, and as a 'Data Artist in Residence' at data visualization startup Conduce. I am active on Twitter @nicholasstrayer and blog about data science and visualization with fellow DataCamp instructor Lucy D'Agostino McGowan at Live Free or Dichotomoize.