Skip to content
NBA Shooting Data
  • AI Chat
  • Code
  • Report
  • NBA Shooting Data

    This dataset contains shooting statistics for four different players during the 2021 NBA Playoffs.

    Not sure where to begin? Scroll to the bottom to find challenges!

    import pandas as pd
    
    pd.read_csv("nba_players_shooting.csv", index_col=0)

    Data Dictionary

    variableclassdescription
    SHOOTERStringName of the player taking the shot
    XfloatHorizontal distance of the shot taken from the basket in ft
    YfloatVertical distance of the shot taken from the basket in ft
    RANGEStringRadius range of the shot taken from the basket in ft
    DEFENDERStringName of the player defending the shot
    SCOREString'MADE' if shot is scored, else 'MISSED'

    Source of dataset.

    Don't know where to start?

    Challenges are brief tasks designed to help you practice specific skills:

    • πŸ—ΊοΈ Explore: At what range is each player most likely to score a shot?
    • πŸ“Š Visualize: Plot the shots made by the X and Y position on the court. For each shot, differentiate between the four different players.
    • πŸ”Ž Analyze: Are players more likely to score a shot the closer they get to the basket?

    Scenarios are broader questions to help you develop an end-to-end project for your portfolio:

    A university basketball team has hired you to use data to improve their performance. They want to know whether it's possible to use past data to provide tailored recommendations to players.

    As a test, you have been provided with NBA shooting data for four players. The manager of the university team has asked you whether it is possible to provide data-driven recommendations for each player based on their likelihood of making a shot. You must also include how reliable your findings are, as well as advice for each player.

    You will need to prepare a report that is accessible to a broad audience. It will need to outline your steps, findings, and conclusions.

    import pandas as pd
    import matplotlib.pyplot as plt
    
    df = pd.read_csv("nba_players_shooting.csv", index_col=0)
    print(df.shape)
    print(df.describe())
    print(df["SHOOTER"].unique())
    # 1. First Analyze At what range is each player most likely to score a shot? by creating a Stacked Bar Chart, that will show the range and the number of shots made from that range.
    
    #Calculating 'Seth Curry' Shooting % from different Ranges
    df_seth = df[df["SHOOTER"] == "Seth Curry"]
    
    # Group the filtered dataframe by "RANGE" and "SCORE", and calculate the percentage of each score within each range
    scores_by_range = round(df_seth.groupby(["RANGE", "SCORE"])["SCORE"].count() / df_seth.groupby("RANGE")["SCORE"].count() * 100, 2)
    
    # Reshape the data to create a stacked bar chart
    scores_by_range = scores_by_range.unstack(level=1)
    
    # Move the last row to the second row
    last_row = scores_by_range.iloc[-1]
    scores_by_range = pd.concat([last_row.to_frame().T, scores_by_range[:-1]])
    print(scores_by_range)
    
    # Create the stacked bar chart
    scores_by_range.plot(kind='bar', stacked=True)
    
    # Add labels and a title
    plt.xlabel("Range")
    plt.ylabel("Percentage")
    plt.title("Percentage of MADE and MISSED Scores by Range")
    plt.show()
    # 1. First Analyze At what range is each player most likely to score a shot? by creating a Stacked Bar Chart, that will show the range and the number of shots made from that range.
    
    # Create a Definition to plot all players Stacked Bar Chart to visualise Made vs Missed shots relative to the Range
    def plot_shooting_percentage(df, player):
        # Filter the dataframe by player
        df_player = df[df["SHOOTER"] == player]
    
        # Group the filtered dataframe by "RANGE" and "SCORE", and calculate the percentage of each           score within each range
        scores_by_range = round(df_player.groupby(["RANGE", "SCORE"])["SCORE"].count() /                                             df_player.groupby("RANGE")["SCORE"].count() * 100, 2)
    
        # Reshape the data to create a stacked bar chart
        scores_by_range = scores_by_range.unstack(level=1)
    
        # Move the last row to the second row
        last_row = scores_by_range.iloc[-1]
        scores_by_range = pd.concat([last_row.to_frame().T, scores_by_range[:-1]])
    
        # Create the stacked bar chart with custom colors
        colors = ['#5DA5DA', '#FAA43A', '#60BD68', '#F17CB0', '#B2912F', '#B276B2', '#DECF3F']
        ax = scores_by_range.plot(kind='bar', stacked=True, color=colors)
        
        # Create the stacked bar chart with custom colors
        ax = scores_by_range.plot(kind='bar', stacked=True, color=colors)
    
         # Create the grid 
        ax.grid(which="major", axis='x', color='#DAD8D7', alpha=0.5)
        ax.grid(which="major", axis='y', color='#DAD8D7', alpha=0.5)
    
        # Add labels and a title
        ax.set_xlabel("Range")
        ax.set_ylabel("Percentage")
        ax.set_title(f"Percentage of MADE and MISSED Scores by Range for {player}")
    
        # Remove the spines
        for spine in ['top', 'right', 'bottom', 'left']:
            ax.spines[spine].set_visible(False)
    
        # Move the legend to the bottom right-hand corner
        ax.legend(loc='lower right')
    
        # Rotate the x-axis tick labels to a horizontal level
        ax.set_xticklabels(ax.get_xticklabels(), rotation=0)
        
    plot_shooting_percentage(df, "Seth Curry")    
    plot_shooting_percentage(df, "Chris Paul")
    plot_shooting_percentage(df, "Russell Westbrook")
    plot_shooting_percentage(df, "Trae Young")
    # 2. Create a Scatter Graph to Plot the shots made by the X and Y position on the court. For each shot, differentiate between the four different players. 
    # This will help us understand from which positions the players like to shoot the ball
    # 3. Create a Sample distribution with hypothesis tests to analyse which players are more likely to score a shot the closer they get to the basket?