Brian Mills is an Assistant Professor at the University of Florida, with research interests encompassing quantitative and economic analysis in sport. He earned a PhD and MA in Sport Management, an MA in Statistics, and an MA in Applied Economics from the University of Michigan. Brian has been an active contributor to the Sabermetric community through blogging about analytics and teaching how to use R to analyze baseball data.
This case study in exploratory data analysis introduces a rich baseball dataset from Major League Baseball's (MLB) Statcast system to develop skills in summarizing and analyzing pitching performance using R. Throughout the course, you will use data on every pitch thrown by Zack Greinke during the 2015 MLB season. These data include information about pitch velocity, pitch type, pitch location, exit speed when the batter makes contact, the game situation (e.g. outs or ball-strike count), and the outcome of each pitch (e.g. strike, foul, home run, or walk). By the end of the course, you will have a thorough understanding of the data and be able to create publication quality visuals to communicate what you have found.
Velocity is a key component in the arsenal of many pitchers. In this chapter, you will examine whether there was an uptick in Zack Greinke's velocity during his impressive July in 2015. The chapter will introduce how to deal with dates, plotting distributions with histograms, and using the very handy
Pitchers throw various types of pitches with different velocities and trajectories in order to make it more difficult for the batter to hit the ball. This chapter will introduce pitch types and make heavy use of tables to examine changes to pitch type choices by Greinke in July, as well as in other important situations.
As with velocity and pitch type, pitch location can play a key role in pitching success. This chapter leverages the rich information about location provided in the MLB Statcast data to visualize changes in Greinke's pitch location choice in July and in different ball-strike counts. You will also make use of the very important
for loop in the context of plotting data.
In this chapter, you'll bring it all together. Minimizing damage on each pitch is the key to run prevention by the pitcher. Therefore, you will look closely at outcomes from pitches thrown by Greinke in different months. We'll also introduce the
ggplot2 package to create high quality visualizations of hitter exit speed when Greinke throws to different locations.