Skip to main content

Sam Starosciak has completed

Case Study: Exploring Baseball Pitching Data in R

Start course For Free
4 hr
5,750 XP
Statement of Accomplishment Badge

Loved by learners at thousands of companies


Course Description

This course is a case study in baseball analytics, exploratory data analysis, and the R language. It introduces a rich baseball dataset from Major League Baseball's (MLB) Statcast system to develop skills in baseball analytics using the R language.

Throughout the course, you will use data on every pitch thrown by Zack Greinke during the 2015 MLB season. These data include information about pitch velocity, pitch type, pitch location, exit speed when the batter makes contact, the game situation (e.g. outs or ball-strike count), and the outcome of each pitch (e.g. strike, foul, home run, or walk).

By the end of the course, you will have a thorough understanding of the data and be able to create publication quality visuals to communicate what you have found.

For Business

Training 2 or more people?

Get your team access to the full DataCamp platform, including all the features.
DataCamp for BusinessFor a bespoke solution book a demo.
  1. 1

    Exploring pitch velocities

    Free

    Velocity is a key component in the arsenal of many pitchers. In this chapter, you will examine whether there was an uptick in Zack Greinke's velocity during his impressive July in 2015. The chapter will introduce how to deal with dates, plotting distributions with histograms, and using the very handy tapply() function.

    Play Chapter Now
    Did Zack Greinke pitch differently in July?
    50 xp
    Clean the data
    100 xp
    Check dates
    100 xp
    Delimit dates
    100 xp
    Subsets and histograms
    50 xp
    Velocity distribution
    100 xp
    Fastball velocity distribution
    100 xp
    Distribution comparisons with color
    100 xp
    Describe the histogram
    50 xp
    Using tapply() for comparisons
    50 xp
    tapply() for velocity changes
    100 xp
    Game-by-game velocity changes
    100 xp
    Tidying the data frame
    100 xp
    A game-by-game line plot
    100 xp
    Adding jittered points
    100 xp
    Wrap-up
    50 xp
  2. 3

    Exploring pitch locations

    As with velocity and pitch type, pitch location can play a key role in pitching success. This chapter leverages the rich information about location provided in the MLB Statcast data to visualize changes in Greinke's pitch location choice in July and in different ball-strike counts. You will also make use of the very important for loop in the context of plotting data.

    Play Chapter Now
For Business

Training 2 or more people?

Get your team access to the full DataCamp platform, including all the features.

datasets

greinke2015

collaborators

Collaborator's avatar
Nick Carchedi
Collaborator's avatar
Tom Jeon

prerequisites

Intermediate R
Brian M. Mills HeadshotBrian M. Mills

Assistant Professor at the University of Florida

Brian Mills is an Assistant Professor at the University of Florida, with research interests encompassing quantitative and economic analysis in sport. He earned a PhD and MA in Sport Management, an MA in Statistics, and an MA in Applied Economics from the University of Michigan. Brian has been an active contributor to the Sabermetric community through blogging about analytics and teaching how to use R to analyze baseball data.
See More

Join over 17 million learners and start Case Study: Exploring Baseball Pitching Data in R today!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.