Skip to content
(Python) Project: Compare Baseball Player Statistics using Visualizations
  • AI Chat
  • Code
  • Report
  • Compare Baseball Player Statistics using Visualizations

    This is Aaron Judge. Judge is one of the physically largest players in Major League Baseball standing 6 feet 7 inches (2.01 m) tall and weighing 282 pounds (128 kg). He also hit one of the hardest home runs ever recorded. How do we know this? Statcast.

    Statcast is a state-of-the-art tracking system that uses high-resolution cameras and radar equipment to measure the precise location and movement of baseballs and baseball players. Introduced in 2015 to all 30 major league ballparks, Statcast data is revolutionizing the game. Teams are engaging in an "arms race" of data analysis, hiring analysts left and right in an attempt to gain an edge over their competition.

    In this project, you're going to wrangle, analyze, and visualize Statcast historical data to compare Mr. Judge and another (extremely large) teammate of his, Giancaro Stanton. They are similar in a lot of ways, one being that they hit a lot of home runs. Stanton and Judge led baseball in home runs in 2017, with 59 and 52, respectively. These are exceptional totals - the player in third "only" had 45 home runs.

    Stanton and Judge are also different in many ways. Let's find out how they compare!

    The Data

    There are two CSV files, judge.csv and stanton.csv, both of which contain Statcast data for 2015-2017. Each row represents one pitch thrown to a batter.

    Custom Functions

    Two functions have also been provided for you to visualize home rome zones

    • assign_x_coord: Assigns an x-coordinate to Statcast's strike zone numbers.
    • assign_y_coord: Assigns a y-coordinate to Statcast's strike zone numbers.

    # import all packages
    
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    df_judge = pd.read_csv('judge.csv')
    df_judge.head()
    print(list(df_judge.columns.values), '\n')
    print(df_judge.info())
    df_stanton = pd.read_csv('stanton.csv')
    df_stanton.head()
    print(list(df_stanton.columns.values), '\n')
    print(df_stanton.info())
    df_judge_events_2017 = df_judge.loc[df_judge['game_year'] == 2017].events
    print(df_judge_events_2017, '\n')
    
    df_stanton_events_2017 = df_stanton.loc[df_stanton['game_year'] == 2017].events
    print(df_stanton_events_2017)
    # Count the number of events each player had for the 2017 using the events column in each dataset.
    
    # Filter events for the year 2017 for both datasets
    df_judge_events_2017 = df_judge.loc[df_judge['game_year'] == 2017, 'events']
    df_stanton_events_2017 = df_stanton.loc[df_stanton['game_year'] == 2017, 'events']
    print(df_judge_events_2017.head(), '\n', df_stanton_events_2017.head(), '\n')
    
    # Count the number of events for each player
    judge_event_count_2017 = df_judge_events_2017.count()
    stanton_event_count_2017 = df_stanton_events_2017.count()
    
    judge_event_count_2017, stanton_event_count_2017

    1. Calculate the number of events each player had in 2017

    Count the number of events each player had for the 2017 using the events column in each dataset.

    How many of each event did Judge and Stanton have in 2017?

    # All of Aaron Judge's batted ball events in 2017
    judge_events_2017 = df_judge.loc[df_judge['game_year'] == 2017].events.value_counts()
    print("Aaron Judge batted ball event totals, 2017:")
    print(judge_events_2017)
    # All of Aaron Judge's batted ball events in 2017
    stanton_events_2017 = df_stanton.loc[df_stanton['game_year'] == 2017].events.value_counts()
    print("Stanton Judge batted ball event totals, 2017:")
    print(stanton_events_2017)

    2. Visualize the launch angle and launch speed for each player