This is Aaron Judge. Judge is one of the physically largest players in Major League Baseball standing 6 feet 7 inches (2.01 m) tall and weighing 282 pounds (128 kg). He also hit one of the hardest home runs ever recorded. How do we know this? Statcast.
Statcast is a state-of-the-art tracking system that uses high-resolution cameras and radar equipment to measure the precise location and movement of baseballs and baseball players. Introduced in 2015 to all 30 major league ballparks, Statcast data is revolutionizing the game. Teams are engaging in an "arms race" of data analysis, hiring analysts left and right in an attempt to gain an edge over their competition.
In this project, you're going to wrangle, analyze, and visualize Statcast historical data to compare Mr. Judge and another (extremely large) teammate of his, Giancaro Stanton. They are similar in a lot of ways, one being that they hit a lot of home runs. Stanton and Judge led baseball in home runs in 2017, with 59 and 52, respectively. These are exceptional totals - the player in third "only" had 45 home runs.
Stanton and Judge are also different in many ways. Let's find out how they compare!
The Data
There are two CSV files, judge.csv
and stanton.csv
, both of which contain Statcast data for 2015-2017. Each row represents one pitch thrown to a batter.
Custom Functions
Two functions have also been provided for you to visualize home rome zones
assign_x_coord
: Assigns an x-coordinate to Statcast's strike zone numbers.assign_y_coord
: Assigns a y-coordinate to Statcast's strike zone numbers.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
judge = pd.read_csv('judge.csv')
stanton = pd.read_csv('stanton.csv')
pd.set_option('display.max_columns', None)
def assign_x_coord(row):
"""
Assigns an x-coordinate to Statcast's strike zone numbers. Zones 11, 12, 13,
and 14 are ignored for plotting simplicity.
"""
if row.zone in [1, 4, 7]:
return 1
if row.zone in [2, 5, 8]:
return 2
if row.zone in [3, 6, 9]:
return 3
def assign_y_coord(row):
"""
Assigns a y-coordinate to Statcast's strike zone numbers. Zones 11, 12, 13,
and 14 are ignored for plotting simplicity.
"""
if row.zone in [1, 2, 3]:
return 3
if row.zone in [4, 5, 6]:
return 2
if row.zone in [7, 8, 9]:
return 1
# Display the last five rows of the Aaron Judge file
judge.tail()
# Start coding here. Use as many cells as you like!
judge_events_2017 = judge.loc[judge['game_year'] == 2017].events.value_counts()
print("Aaron Judge batted ball event totals, 2017:")
print(judge_events_2017)
stanton_events_2017 = stanton.loc[stanton['game_year'] == 2017].events.value_counts()
print("\nGiancarlo Stanton batted ball event totals, 2017:")
print(stanton_events_2017)
judge_hr = judge.loc[judge['events'] == 'home_run']
stanton_hr = stanton.loc[stanton['events'] == 'home_run']
fig1, ax1 = plt.subplots(ncols=2, sharex=True, sharey=True)
sns.kdeplot(x=judge_hr.launch_angle, y=judge_hr.launch_speed, cmap="Blues", shade=True, shade_lowest=False, ax=ax1[0]).set_title('Aaron Judge\nHome Runs, 2015-2017')
sns.kdeplot(x=stanton_hr.launch_angle, y=stanton_hr.launch_speed, cmap="Blues", shade=True, shade_lowest=False, ax=ax1[1]).set_title('Giancarlo Stanton\nHome Runs, 2015-2017')
player_hr = "Stanton"
judge_stanton_hr = pd.concat([judge_hr, stanton_hr])
sns.boxplot(x='player_name', y='release_speed', color='tab:blue', data=judge_stanton_hr).set_title('Home Runs, 2015-2017')
player_fast = "Judge"
judge_strike_hr = judge_hr.copy().loc[judge_hr.zone <= 9]
judge_strike_hr['zone_x'] = judge_strike_hr.apply(assign_x_coord, axis=1)
judge_strike_hr['zone_y'] = judge_strike_hr.apply(assign_y_coord, axis=1)
plt.hist2d(judge_strike_hr.zone_x, judge_strike_hr.zone_y, bins = 3, cmap='Blues')
plt.title('Aaron Judge Home Runs on\n Pitches in the Strike Zone, 2015-2017')
plt.gca().get_xaxis().set_visible(False)
plt.gca().get_yaxis().set_visible(False)
cb = plt.colorbar()
cb.set_label('Counts in Bin')
stanton_strike_hr = stanton_hr.copy().loc[stanton_hr.zone <= 9]
stanton_strike_hr['zone_x'] = stanton_strike_hr.apply(assign_x_coord, axis=1)
stanton_strike_hr['zone_y'] = stanton_strike_hr.apply(assign_y_coord, axis=1)
plt.hist2d(stanton_strike_hr.zone_x, stanton_strike_hr.zone_y, bins = 3, cmap='Blues')
plt.title('Giancarlo Stanton Home Runs on\n Pitches in the Strike Zone, 2015-2017')
plt.gca().get_xaxis().set_visible(False)
plt.gca().get_yaxis().set_visible(False)
cb = plt.colorbar()
cb.set_label('Counts in Bin')