Mastery requires practice. Having completed Statistical Thinking I and II, you developed your probabilistic mindset and the hacker stats skills to extract actionable insights from your data. Your foundation is in place, and now it is time practice your craft. In this course, you will apply your statistical thinking skills, exploratory data analysis, parameter estimation, and hypothesis testing, to two new real-world data sets. First, you will explore data from the 2013 and 2015 FINA World Aquatics Championships, where you will quantify the relative speeds and variability among swimmers. You will then perform a statistical analysis to assess the "current controversy" of the 2013 Worlds in which swimmers claimed that a slight current in the pool was affecting result. Second, you will study the frequency and magnitudes of earthquakes around the world. Finally, you will analyze the changes in seismicity in the US state of Oklahoma after the practice of high pressure waste water injection at oil extraction sites became commonplace in the last decade. As you work with these data sets, you will take vital steps toward mastery as you cement your existing knowledge and broaden your abilities to use statistics and Python to make sense of your data.
To begin, you'll use two data sets from Caltech researchers to rehash the key points of Statistical Thinking I and II to prepare you for the following case studies!
In this chapter, you will practice your EDA, parameter estimation, and hypothesis testing skills on the results of the 2015 FINA World Swimming Championships.
Some swimmers said that they felt it was easier to swim in one direction versus another in the 2013 World Championships. Some analysts have posited that there was a swirling current in the pool. In this chapter, you'll investigate this claim! References - Quartz Media, Washington Post, SwimSwam (and also here), and Cornett, et al.
Herein, you'll use your statistical thinking skills to study the frequency and magnitudes of earthquakes. Along the way, you'll learn some basic statistical seismology, including the Gutenberg-Richter law. This exercise exposes two key ideas about data science: 1) As a data scientist, you wander into all sorts of domain specific analyses, which is very exciting. You constantly get to learn. 2) You are sometimes faced with limited data, which is also the case for many of these earthquake studies. You can still make good progress!
Of course, earthquakes have a big impact on society, and recently are connected to human activity. In this final chapter, you'll investigate the effect that increased injection of saline wastewater due to oil mining in Oklahoma has had on the seismicity of the region.