Compare Baseball Player Statistics using Visualizations
This is Aaron Judge. Judge is one of the physically largest players in Major League Baseball standing 6 feet 7 inches (2.01 m) tall and weighing 282 pounds (128 kg). He also hit one of the hardest home runs ever recorded. How do we know this? Statcast.
Statcast is a state-of-the-art tracking system that uses high-resolution cameras and radar equipment to measure the precise location and movement of baseballs and baseball players. Introduced in 2015 to all 30 major league ballparks, Statcast data is revolutionizing the game. Teams are engaging in an "arms race" of data analysis, hiring analysts left and right in an attempt to gain an edge over their competition.
In this project, you're going to wrangle, analyze, and visualize Statcast historical data to compare Mr. Judge and another (extremely large) teammate of his, Giancaro Stanton. They are similar in a lot of ways, one being that they hit a lot of home runs. Stanton and Judge led baseball in home runs in 2017, with 59 and 52, respectively. These are exceptional totals - the player in third "only" had 45 home runs.
Stanton and Judge are also different in many ways. Let's find out how they compare!
The Data
There are two CSV files, judge.csv
and stanton.csv
, both of which contain Statcast data for 2015-2017. Each row represents one pitch thrown to a batter.
Custom Functions
Two functions have also been provided for you to visualize home rome zones
assign_x_coord
: Assigns an x-coordinate to Statcast's strike zone numbers.assign_y_coord
: Assigns a y-coordinate to Statcast's strike zone numbers.
# import all packages
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df_judge = pd.read_csv('judge.csv')
df_judge.head()
print(list(df_judge.columns.values), '\n')
print(df_judge.info())
df_stanton = pd.read_csv('stanton.csv')
df_stanton.head()
print(list(df_stanton.columns.values), '\n')
print(df_stanton.info())
df_judge_events_2017 = df_judge.loc[df_judge['game_year'] == 2017].events
print(df_judge_events_2017, '\n')
df_stanton_events_2017 = df_stanton.loc[df_stanton['game_year'] == 2017].events
print(df_stanton_events_2017)
# Count the number of events each player had for the 2017 using the events column in each dataset.
# Filter events for the year 2017 for both datasets
df_judge_events_2017 = df_judge.loc[df_judge['game_year'] == 2017, 'events']
df_stanton_events_2017 = df_stanton.loc[df_stanton['game_year'] == 2017, 'events']
print(df_judge_events_2017.head(), '\n', df_stanton_events_2017.head(), '\n')
# Count the number of events for each player
judge_event_count_2017 = df_judge_events_2017.count()
stanton_event_count_2017 = df_stanton_events_2017.count()
judge_event_count_2017, stanton_event_count_2017
1. Calculate the number of events each player had in 2017
Count the number of events each player had for the 2017 using the events column in each dataset.
How many of each event did Judge and Stanton have in 2017?
# All of Aaron Judge's batted ball events in 2017
judge_events_2017 = df_judge.loc[df_judge['game_year'] == 2017].events.value_counts()
print("Aaron Judge batted ball event totals, 2017:")
print(judge_events_2017)
# All of Aaron Judge's batted ball events in 2017
stanton_events_2017 = df_stanton.loc[df_stanton['game_year'] == 2017].events.value_counts()
print("Stanton Judge batted ball event totals, 2017:")
print(stanton_events_2017)
2. Visualize the launch angle and launch speed for each player