Case Study: Exploring Baseball Pitching Data in R

Use a rich baseball dataset from the MLB's Statcast system to practice your data exploration skills.
Start Course for Free
4 Hours14 Videos69 Exercises8,775 Learners
5750 XP

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA. You confirm you are at least 16 years old (13 if you are an authorized Classrooms user).

Loved by learners at thousands of companies

Course Description

<p>This course is a case study in baseball analytics, exploratory data analysis, and the R language. It introduces a rich baseball dataset from Major League Baseball's (MLB) Statcast system to develop skills in baseball analytics using the R language.</p><p>Throughout the course, you will use data on every pitch thrown by Zack Greinke during the 2015 MLB season. These data include information about pitch velocity, pitch type, pitch location, exit speed when the batter makes contact, the game situation (e.g. outs or ball-strike count), and the outcome of each pitch (e.g. strike, foul, home run, or walk).<p><p>By the end of the course, you will have a thorough understanding of the data and be able to create publication quality visuals to communicate what you have found.</p>

  1. 1

    Exploring pitch velocities

    Velocity is a key component in the arsenal of many pitchers. In this chapter, you will examine whether there was an uptick in Zack Greinke's velocity during his impressive July in 2015. The chapter will introduce how to deal with dates, plotting distributions with histograms, and using the very handy tapply() function.
    Play Chapter Now
  2. 2

    Exploring pitch types

    Pitchers throw various types of pitches with different velocities and trajectories in order to make it more difficult for the batter to hit the ball. This chapter will introduce pitch types and make heavy use of tables to examine changes to pitch type choices by Greinke in July, as well as in other important situations.
    Play Chapter Now
  3. 3

    Exploring pitch locations

    As with velocity and pitch type, pitch location can play a key role in pitching success. This chapter leverages the rich information about location provided in the MLB Statcast data to visualize changes in Greinke's pitch location choice in July and in different ball-strike counts. You will also make use of the very important for loop in the context of plotting data.
    Play Chapter Now
  4. 4

    Exploring batted ball outcomes

    In this chapter, you'll bring it all together. Minimizing damage on each pitch is the key to run prevention by the pitcher. Therefore, you will look closely at outcomes from pitches thrown by Greinke in different months. We'll also introduce the ggplot2 package to create high quality visualizations of hitter exit speed when Greinke throws to different locations.
    Play Chapter Now
Nick CarchediTom Jeon
Intermediate R
Brian M. Mills Headshot

Brian M. Mills

Assistant Professor at the University of Florida
Brian Mills is an Assistant Professor at the University of Florida, with research interests encompassing quantitative and economic analysis in sport. He earned a PhD and MA in Sport Management, an MA in Statistics, and an MA in Applied Economics from the University of Michigan. Brian has been an active contributor to the Sabermetric community through blogging about analytics and teaching how to use R to analyze baseball data.
See More

What do other learners have to say?

I've used other sites—Coursera, Udacity, things like that—but DataCamp's been the one that I've stuck with.

Devon Edwards Joseph
Lloyds Banking Group

DataCamp is the top resource I recommend for learning data science.

Louis Maiden
Harvard Business School

DataCamp is by far my favorite website to learn from.

Ronald Bowers
Decision Science Analytics, USAA