Loved by learners at thousands of companies
Having honed your visualization skills by learning ggplot2, it's now time to tackle larger datasets. In this course, you will learn several techniques for visualizing big data, with particular focus on the scalable visualization technique of faceting. You will learn how to put this technique into action using the Trelliscope approach as implemented in the trelliscopejs R package. Trelliscope plugs seamlessly into standard R workflows and produces interactive visualizations that allow you to visually explore your data in detail. By the end of this course, you will be able to easily create interactive exploratory displays of large datasets that will help you and your colleagues gain new insights into your data.
General strategies for visualizing big dataFree
Learn different strategies for plotting big data using ggplot2, including calculating and plotting summary statistics, various techniques to deal with overplotting, and principles of small multiples with faceting, which leads into Trelliscope.Visualizing summaries50 xpDaily ride counts100 xpDistribution of cab fare amount100 xpDistribution of payment type100 xpAdding more detail to summaries50 xpRelationship between trip duration and total fare100 xpFaceting daily rides100 xpTip amount distribution faceted by payment type100 xpVisualizing subsets50 xpComparing fare distribution by payment type100 xpVisualizing all subsets50 xp
ggplot2 + TrelliscopeJS
In the previous chapter you saw how faceting can be used as a powerful technique for visualizing a lot of data that can be naturally partitioned in some meaningful way. Now, using the trelliscopejs package with ggplot2, you will learn how to create faceted visualizations when the number of partitions in the data becomes too large to effectively view in a single screen.Faceting with TrelliscopeJS50 xpTrelliscope faceting gapminder by country100 xpInteracting with the TrelliscopeJS displays50 xpInteracting with the display50 xpAdditional TrelliscopeJS features50 xpCustomizing the gapminder display100 xpExamining the new cognostics50 xpAdding your own cognostics50 xpAdding custom cognostics100 xpInterpreting custom cognostics50 xp
Trelliscope in the Tidyverse
The ggplot2 + trelliscopejs interface is easy to use, but trelliscopejs also provides a faceted plotting mechanism that gives you much more flexibility in what plotting system you use and how to specify cognostics. You will learn all about that in this chapter!Trelliscope in the tidyverse50 xpGrouping and nesting100 xpStock price display100 xpExploring the display50 xpCognostics50 xpAdding cognostics100 xpCognostics from nested data frames100 xpNavigating stock plots with new cognostics50 xpTrelliscope options50 xpCustomizing the stock display100 xpVisualizing databases of images50 xpVisualizing Pokemon100 xpThe most powerful Pokemon50 xp
Case Study: Exploring Montreal BIXI Bike Data
The Montreal BIXI bike network provides open data for every bike ride, including the date, time, duration, and start and end stations of the ride. In this chapter, you will analyze data from over 4 million bike rides in 2017, going between 546 stations. There are many interesting exploratory questions to ask from this data and you will create exploratory visualizations ranging from summary statistics to detailed Trelliscope visualizations that will give you interesting insight into the data.Montreal BIXI bike data50 xpNumber of daily rides100 xpExamining time-of-day100 xpEffect of membership and weekday100 xpSummary visualization recap50 xpDaily plots100 xpLooking at all days100 xpTop 100 routes dataset50 xpAugmenting the data: Route summary statistics100 xpVisualizing the data: Counts by hour-of-day100 xpEvaluating the visualization50 xpAu revoir50 xp
PrerequisitesIntroduction to the Tidyverse
Author of TrelliscopeJS
Ryan Hafen is a statistical consultant and a remote adjunct assistant professor in the Statistics Department at Purdue University. Ryan's research focuses on methodology, tools, and applications in exploratory analysis, statistical model building, and machine learning on large, complex datasets. He is the developer of the datadr and Trelliscope components of the Tessera project, as well as the rbokeh R visualization interface to the Bokeh plotting library. Prior to his work as a statistical consultant, Ryan worked at Pacific Northwest National Laboratory doing applied work on large complex data spanning many domains, including power systems engineering, nuclear forensics, high energy physics, biology, and cyber security.