This is Aaron Judge. Judge is one of the physically largest players in Major League Baseball standing 6 feet 7 inches (2.01 m) tall and weighing 282 pounds (128 kg). He also hit one of the hardest home runs ever recorded. How do we know this? Statcast.
Statcast is a state-of-the-art tracking system that uses high-resolution cameras and radar equipment to measure the precise location and movement of baseballs and baseball players. Introduced in 2015 to all 30 major league ballparks, Statcast data is revolutionizing the game. Teams are engaging in an "arms race" of data analysis, hiring analysts left and right in an attempt to gain an edge over their competition.
In this project, you're going to wrangle, analyze, and visualize Statcast historical data to compare Mr. Judge and another (extremely large) teammate of his, Giancaro Stanton. They are similar in a lot of ways, one being that they hit a lot of home runs. Stanton and Judge led baseball in home runs in 2017, with 59 and 52, respectively. These are exceptional totals - the player in third "only" had 45 home runs.
Stanton and Judge are also different in many ways. Let's find out how they compare!
The Data
There are two CSV files, judge.csv and stanton.csv, both of which contain Statcast data for 2015-2017. Each row represents one pitch thrown to a batter.
Custom Functions
Two functions have also been provided for you to visualize home rome zones
assign_x_coord: Assigns an x-coordinate to Statcast's strike zone numbers.assign_y_coord: Assigns a y-coordinate to Statcast's strike zone numbers.
# Run this cell to begin
# Import the necessary packages
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
# Load Aaron Judge's Statcast data
judge = pd.read_csv('judge.csv')
# Load Giancarlo Stanton's Statcast data
stanton = pd.read_csv('stanton.csv')
# Display all columns (pandas will collapse some columns if we don't set this option)
pd.set_option('display.max_columns', None)
# Custom Functions
def assign_x_coord(row):
"""
Assigns an x-coordinate to Statcast's strike zone numbers. Zones 11, 12, 13,
and 14 are ignored for plotting simplicity.
"""
# Left third of strike zone
if row.zone in [1, 4, 7]:
return 1
# Middle third of strike zone
if row.zone in [2, 5, 8]:
return 2
# Right third of strike zone
if row.zone in [3, 6, 9]:
return 3
def assign_y_coord(row):
"""
Assigns a y-coordinate to Statcast's strike zone numbers. Zones 11, 12, 13,
and 14 are ignored for plotting simplicity.
"""
# Upper third of strike zone
if row.zone in [1, 2, 3]:
return 3
# Middle third of strike zone
if row.zone in [4, 5, 6]:
return 2
# Lower third of strike zone
if row.zone in [7, 8, 9]:
return 1
# Display the last five rows of the Aaron Judge file
judge.tail()
stanton.tail()judge_events_2017 = judge.query('game_year == 2017')['events'].value_counts()stanton_events_2017 = stanton.query('game_year == 2017')['events'].value_counts()hom_run = judge.query('game_year == 2017 and events == "home_run"')
hom_runhome_run = stanton.query('game_year == 2017 and events == "home_run"')fig1, ax1 = plt.subplots(ncols=2, sharex=True, sharey=True)
sns.kdeplot(x=judge_hr.launch_angle, y=judge_hr.launch_speed, cmap="Blues", shade=True, shade_lowest=False, ax=ax1[0]).set_title('Judge')
sns.kdeplot(x=stanton_hr.launch_angle, y=stanton_hr.launch_speed, cmap="Blues", shade=True, shade_lowest=False, ax=ax1[1]).set_title('Sranton')
plt.show()
pd.concat([hom_run, home_run])sns.boxplot(x = 'player_name', y = 'release_speed', data = hom_run)sns.boxplot(x = 'player_name', y = 'release_speed', data = home_run)
player_fast = 'Judge'judge_strike_hr = hom_run.loc[hom_run['zone'] <= 9]stanton_strike_hr = home_run.loc[home_run['zone'] <= 9]judge_strike_hr['zone_x'] = judge_strike_hr.apply(lambda row: assign_x_coord(row), axis=1)
judge_strike_hr['zone_y'] = judge_strike_hr.apply(lambda row: assign_y_coord(row), axis=1)stanton_strike_hr['zone_x'] = stanton_strike_hr.apply(lambda row: assign_x_coord(row), axis=1)
stanton_strike_hr['zone_y'] = stanton_strike_hr.apply(lambda row: assign_y_coord(row), axis=1)