Skip to content
NBA Shooting Data
  • AI Chat
  • Code
  • Report
  • NBA Shooting Data

    This dataset contains shooting statistics for four different players during the 2021 NBA Playoffs.

    Not sure where to begin? Scroll to the bottom to find challenges!

    import pandas as pd
    
    pd.read_csv("nba_players_shooting.csv", index_col=0)

    Data Dictionary

    variableclassdescription
    SHOOTERStringName of the player taking the shot
    XfloatHorizontal distance of the shot taken from the basket in ft
    YfloatVertical distance of the shot taken from the basket in ft
    RANGEStringRadius range of the shot taken from the basket in ft
    DEFENDERStringName of the player defending the shot
    SCOREString'MADE' if shot is scored, else 'MISSED'

    Source of dataset.

    Don't know where to start?

    Challenges are brief tasks designed to help you practice specific skills:

    • 🗺️ Explore: At what range is each player most likely to score a shot?
    • 📊 Visualize: Plot the shots made by the X and Y position on the court. For each shot, differentiate between the four different players.
    • 🔎 Analyze: Are players more likely to score a shot the closer they get to the basket?

    Scenarios are broader questions to help you develop an end-to-end project for your portfolio:

    A university basketball team has hired you to use data to improve their performance. They want to know whether it's possible to use past data to provide tailored recommendations to players.

    As a test, you have been provided with NBA shooting data for four players. The manager of the university team has asked you whether it is possible to provide data-driven recommendations for each player based on their likelihood of making a shot. You must also include how reliable your findings are, as well as advice for each player.

    You will need to prepare a report that is accessible to a broad audience. It will need to outline your steps, findings, and conclusions.

    Recommendations for the University Basketball Team:

    1. Analyze individual player shooting efficiency by range and defender to identify strengths and areas for improvement.

    2. Suggest targeted practice sessions focusing on ranges where each player is less efficient.

    3. Consider defender impact on shooting efficiency and develop strategies to overcome strong defenses.

    4. Tailored Practice: Focusing on shot types and ranges where players show lower efficiency.

    5. Defensive Strategies: Analyzing how different defenders impact shooting efficiency and developing counter-strategies.

    6. Shot Selection: Encouraging players to take shots from ranges where they are most efficient and working on improving weaker areas.

    import pandas as pd
    
    # Step 1: Load the dataset
    
    df = pd.read_csv("nba_players_shooting.csv")
    
    # Assuming the 'RANGE' column in the dataset is categorical with ranges defined as strings, e.g., "(0, 4)", "(4, 8)", etc.
    
    # And the 'SCORE' column indicates 'MADE' for a successful shot and 'MISSED' for an unsuccessful shot.
    
    # Step 2: Prepare the data by marking each shot as 1 for made and 0 for missed
    
    df['SHOT_MADE'] = df['SCORE'].apply(lambda x: 1 if x == 'MADE' else 0)
    
    # Step 3: Calculate shooting efficiency within each range for each player
    
    # Group by 'SHOOTER' and 'RANGE', then calculate the sum of shots made and count of shots to find efficiency
    
    efficiency = df.groupby(['SHOOTER', 'RANGE']).agg(
    
        Shots_Made=('SHOT_MADE', 'sum'),
    
        Total_Shots=('SHOT_MADE', 'count')
    
    ).reset_index()
    
    efficiency['Efficiency'] = efficiency['Shots_Made'] / efficiency['Total_Shots']
    
    # Step 4: Identify the range with the highest shooting efficiency for each player
    
    best_range = efficiency.loc[efficiency.groupby('SHOOTER')['Efficiency'].idxmax()]
    
    # Display the range with the highest efficiency for each player
    
    print(best_range[['SHOOTER', 'RANGE', 'Efficiency']])
    import pandas as pd
    
    import matplotlib.pyplot as plt
    
    # Load the dataset
    
    df = pd.read_csv("nba_players_shooting.csv")
    
    # Assuming the dataset includes 'SHOOTER', 'X', 'Y', and 'SCORE' columns
    
    # Filtering shots that were made
    
    df_made = df[df['SCORE'] == 'MADE']
    
    # Plotting
    
    plt.figure(figsize=(15, 7.5))
    
    # Define markers for each player to differentiate in the plot
    
    markers = ['o', 's', '^', 'x']
    
    players = df_made['SHOOTER'].unique()
    
    if len(players) > 4:
    
        # Extend markers list if there are more than 4 players
    
        markers += ['o'] * (len(players) - 4)
    
    for player, marker in zip(players, markers):
    
        # Filter data for each player
    
        player_data = df_made[df_made['SHOOTER'] == player]
    
        plt.scatter(player_data['X'], player_data['Y'], label=player, marker=marker)
    
    # Adding plot decorations
    
    plt.title('Shots Made by X and Y Position on the Court')
    
    plt.xlabel('X Position (ft)')
    
    plt.ylabel('Y Position (ft)')
    
    plt.axhline(0, color='black', linewidth=0.5)  # Court center line
    
    plt.axvline(0, color='black', linewidth=0.5)  # Court center line
    
    plt.legend()
    
    plt.grid(True, which='both', linestyle='--', linewidth=0.5)
    
    plt.show()
    import pandas as pd
    import numpy as np
    
    # Load the dataset
    df = pd.read_csv("nba_players_shooting.csv")
    
    # Assuming 'X' and 'Y' are the coordinates of the shot, calculate the distance from the basket
    # The basket is considered to be at the origin (0,0)
    df['DISTANCE'] = np.sqrt(df['X']**2 + df['Y']**2)
    
    # Bin the distances into ranges (e.g., 0-4 ft, 4-8 ft, etc.)
    # Adjust bins based on your dataset's distance range and desired granularity
    bins = [0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40]
    df['DISTANCE_RANGE'] = pd.cut(df['DISTANCE'], bins=bins)
    
    # Calculate shooting efficiency within each distance range
    # First, convert 'SCORE' to numerical values: 1 for made shots, 0 for missed shots
    df['SHOT_MADE'] = df['SCORE'].apply(lambda x: 1 if x == 'MADE' else 0)
    
    # Group by 'DISTANCE_RANGE' and calculate efficiency
    efficiency_by_range = df.groupby('DISTANCE_RANGE')['SHOT_MADE'].agg(['mean', 'count']).rename(columns={'mean': 'EFFICIENCY', 'count': 'ATTEMPTS'})
    
    # Display the efficiency and attempts for each distance range
    print(efficiency_by_range)

    what we don't know is where the defender is in each of the shots, which can be a crucial difference maker if the shooter is wide open as opposed to having to release quick enough to just get over an outstretched hand.