Rick Scavetta is a biologist, workshop trainer, freelance data scientist and cofounder of Science Craft, a company dedicated to helping scientists better understand and visualize their data. Rick's practical, hands-on exposure to a wide variety of datasets has informed him of the many problems scientists face when trying to visualize their data.
This ggplot2 tutorial builds on your knowledge from the first course to produce meaningful explanatory plots. We'll explore the last four optional layers. Statistics will be calculated on the fly and we’ll see how Coordinates and Facets aid in communication. Publication quality plots will be produced directly in R using the Themes layer. We’ll also discuss details on data visualization best practices with ggplot2 to help make sure you have a sound understanding of what works and why. By the end of the course, you’ll have all the tools needed to make a custom plotting function to explore a large data set, combining statistics and excellent visuals.
In this chapter, we’ll delve into how to use R ggplot2 as a tool for graphical data analysis, progressing from just plotting data to applying a variety of statistical methods. This includes a variety of linear models, descriptive and inferential statistics (mean, standard deviation and confidence intervals) and custom functions.
The Coordinates and Facets layers offer specific and very useful tools for efficiently and accurately communicating data. In this chapter we’ll look at the various ways of effectively using these two layers.
Now that you’ve built high-quality plots, it’s time to make them pretty. This is the last step in the data viz process. The Themes layer will enable you to make publication quality plots directly in R.
Once you have the technical skill to make great visualizations, it’s important that you make them as meaningful as possible. In this chapter we’ll go over three plot types that are mostly discouraged in the data viz community - heat maps, pie charts and dynamite plots. We’ll understand what the problems are with these plots and what the alternatives are.
In this case study, we’ll explore the large, publicly available California Health Interview Survey dataset from 2009. We’ll go step-by-step through the development of a new plotting method - a mosaic plot - combining statistics and flexible visuals. At the end, we’ll generalize our new plotting method to use on a variety of datasets we’ve seen throughout the first two courses.