Sam Starosciak has completed
Case Study: Exploring Baseball Pitching Data in R
Start course For Free4 hr
5,750 XP

Loved by learners at thousands of companies
Course Description
This course is a case study in baseball analytics, exploratory data analysis, and the R language. It introduces a rich baseball dataset from Major League Baseball's (MLB) Statcast system to develop skills in baseball analytics using the R language.
Throughout the course, you will use data on every pitch thrown by Zack Greinke during the 2015 MLB season. These data include information about pitch velocity, pitch type, pitch location, exit speed when the batter makes contact, the game situation (e.g. outs or ball-strike count), and the outcome of each pitch (e.g. strike, foul, home run, or walk).
By the end of the course, you will have a thorough understanding of the data and be able to create publication quality visuals to communicate what you have found.
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features.- 1Exploring pitch velocitiesFreeVelocity is a key component in the arsenal of many pitchers. In this chapter, you will examine whether there was an uptick in Zack Greinke's velocity during his impressive July in 2015. The chapter will introduce how to deal with dates, plotting distributions with histograms, and using the very handy tapply()function.Did Zack Greinke pitch differently in July?50 xpClean the data100 xpCheck dates100 xpDelimit dates100 xpSubsets and histograms50 xpVelocity distribution100 xpFastball velocity distribution100 xpDistribution comparisons with color100 xpDescribe the histogram50 xpUsing tapply() for comparisons50 xptapply() for velocity changes100 xpGame-by-game velocity changes100 xpTidying the data frame100 xpA game-by-game line plot100 xpAdding jittered points100 xpWrap-up50 xp
- 2Exploring pitch typesPitchers throw various types of pitches with different velocities and trajectories in order to make it more difficult for the batter to hit the ball. This chapter will introduce pitch types and make heavy use of tables to examine changes to pitch type choices by Greinke in July, as well as in other important situations. Pitch mix50 xpPitch mix tables100 xpPitch mix table using prop.table()100 xpPitch mix tables - July vs. other100 xpDescribe fastball usage50 xpPitch mix tables - changes in pitch type rates100 xpDescribe pitch usage50 xpBall-strike count and pitch usage50 xpBall-strike count frequency100 xpMake a new variable100 xpBall-strike count in July vs. other months100 xpVisualizing ball-strike count in July vs. other months100 xpCross-tabulate pitch use in ball-strike counts100 xpDescribe pitch count usage50 xpPitch mix late in games100 xpLate game pitch mix - grouped barplots100 xpDescribe late game pitching50 xpWrap-up50 xp
- 3Exploring pitch locationsAs with velocity and pitch type, pitch location can play a key role in pitching success. This chapter leverages the rich information about location provided in the MLB Statcast data to visualize changes in Greinke's pitch location choice in July and in different ball-strike counts. You will also make use of the very important forloop in the context of plotting data.Pitch location and Greinke's July50 xpLocational changes - summary100 xpDescribe the locations50 xpLocational changes - visualization100 xpLocational changes - plotting a grid100 xpBinning locational data100 xpGrid percentage question50 xpFor loops for plots50 xpFor loops and plotting locational grid proportions100 xpBinned locational differences100 xpPlotting zone proportion differences100 xpDescribe the figure50 xpLocation and ball-strike count100 xp0-2 vs. 3-0 locational changes100 xpPlotting count-based locational differences100 xpWrap-up50 xp
- 4Exploring batted ball outcomesIn this chapter, you'll bring it all together. Minimizing damage on each pitch is the key to run prevention by the pitcher. Therefore, you will look closely at outcomes from pitches thrown by Greinke in different months. We'll also introduce the ggplot2package to create high quality visualizations of hitter exit speed when Greinke throws to different locations.Batted ball outcomes - contact rate50 xpVelocity impact on contact rate100 xpPitch type impact on contact rate100 xpVelocity impact on contact by pitch type100 xpGreinke's out pitch?100 xpDescribe 2-strike pitch usage50 xpImpact of pitch location on contact rate100 xpUsing ggplot250 xpRethinking the use of for loops100 xpContact rate with ggplot2100 xpAdding titles and axes to ggplot2 figure100 xpMaking a heat map - visualizing hot and cold zones100 xpAdding text for contact rate values100 xpBatted ball outcomes - exit velocity50 xpContact and exit speed100 xpLocation and exit speed100 xpPlotting exit speed as a heat map100 xpUsing tidy data and facets in ggplot2100 xpWrap-up50 xp
Training 2 or more people?
Get your team access to the full DataCamp platform, including all the features. Brian M. Mills
Brian M. MillsAssistant Professor at the University of Florida
Brian Mills is an Assistant Professor at the University of Florida, with research interests encompassing quantitative and economic analysis in sport. He earned a PhD and MA in Sport Management, an MA in Statistics, and an MA in Applied Economics from the University of Michigan. Brian has been an active contributor to the Sabermetric community through blogging about analytics and teaching how to use R to analyze baseball data.
Join over 17 million learners and start Case Study: Exploring Baseball Pitching Data in R today!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.
