Skip to content

NBA Shooting Data

This dataset contains shooting statistics for four different players during the 2021 NBA Playoffs.

Not sure where to begin? Scroll to the bottom to find challenges!

import pandas as pd

pd.read_csv("nba_players_shooting.csv", index_col=0)

Data Dictionary

variableclassdescription
SHOOTERStringName of the player taking the shot
XfloatHorizontal distance of the shot taken from the basket in ft
YfloatVertical distance of the shot taken from the basket in ft
RANGEStringRadius range of the shot taken from the basket in ft
DEFENDERStringName of the player defending the shot
SCOREString'MADE' if shot is scored, else 'MISSED'

Source of dataset.

Don't know where to start?

Challenges are brief tasks designed to help you practice specific skills:

  • 🗺️ Explore: At what range is each player most likely to score a shot?
  • 📊 Visualize: Plot the shots made by the X and Y position on the court. For each shot, differentiate between the four different players.
  • 🔎 Analyze: Are players more likely to score a shot the closer they get to the basket?

Scenarios are broader questions to help you develop an end-to-end project for your portfolio:

A university basketball team has hired you to use data to improve their performance. They want to know whether it's possible to use past data to provide tailored recommendations to players.

As a test, you have been provided with NBA shooting data for four players. The manager of the university team has asked you whether it is possible to provide data-driven recommendations for each player based on their likelihood of making a shot. You must also include how reliable your findings are, as well as advice for each player.

You will need to prepare a report that is accessible to a broad audience. It will need to outline your steps, findings, and conclusions.

Recommendations for the University Basketball Team:

  1. Analyze individual player shooting efficiency by range and defender to identify strengths and areas for improvement.

  2. Suggest targeted practice sessions focusing on ranges where each player is less efficient.

  3. Consider defender impact on shooting efficiency and develop strategies to overcome strong defenses.

  4. Tailored Practice: Focusing on shot types and ranges where players show lower efficiency.

  5. Defensive Strategies: Analyzing how different defenders impact shooting efficiency and developing counter-strategies.

  6. Shot Selection: Encouraging players to take shots from ranges where they are most efficient and working on improving weaker areas.

import pandas as pd

# Step 1: Load the dataset

df = pd.read_csv("nba_players_shooting.csv")

# Assuming the 'RANGE' column in the dataset is categorical with ranges defined as strings, e.g., "(0, 4)", "(4, 8)", etc.

# And the 'SCORE' column indicates 'MADE' for a successful shot and 'MISSED' for an unsuccessful shot.

# Step 2: Prepare the data by marking each shot as 1 for made and 0 for missed

df['SHOT_MADE'] = df['SCORE'].apply(lambda x: 1 if x == 'MADE' else 0)

# Step 3: Calculate shooting efficiency within each range for each player

# Group by 'SHOOTER' and 'RANGE', then calculate the sum of shots made and count of shots to find efficiency

efficiency = df.groupby(['SHOOTER', 'RANGE']).agg(

    Shots_Made=('SHOT_MADE', 'sum'),

    Total_Shots=('SHOT_MADE', 'count')

).reset_index()

efficiency['Efficiency'] = efficiency['Shots_Made'] / efficiency['Total_Shots']

# Step 4: Identify the range with the highest shooting efficiency for each player

best_range = efficiency.loc[efficiency.groupby('SHOOTER')['Efficiency'].idxmax()]

# Display the range with the highest efficiency for each player

print(best_range[['SHOOTER', 'RANGE', 'Efficiency']])
import pandas as pd

import matplotlib.pyplot as plt

# Load the dataset

df = pd.read_csv("nba_players_shooting.csv")

# Assuming the dataset includes 'SHOOTER', 'X', 'Y', and 'SCORE' columns

# Filtering shots that were made

df_made = df[df['SCORE'] == 'MADE']

# Plotting

plt.figure(figsize=(15, 7.5))

# Define markers for each player to differentiate in the plot

markers = ['o', 's', '^', 'x']

players = df_made['SHOOTER'].unique()

if len(players) > 4:

    # Extend markers list if there are more than 4 players

    markers += ['o'] * (len(players) - 4)

for player, marker in zip(players, markers):

    # Filter data for each player

    player_data = df_made[df_made['SHOOTER'] == player]

    plt.scatter(player_data['X'], player_data['Y'], label=player, marker=marker)

# Adding plot decorations

plt.title('Shots Made by X and Y Position on the Court')

plt.xlabel('X Position (ft)')

plt.ylabel('Y Position (ft)')

plt.axhline(0, color='black', linewidth=0.5)  # Court center line

plt.axvline(0, color='black', linewidth=0.5)  # Court center line

plt.legend()

plt.grid(True, which='both', linestyle='--', linewidth=0.5)

plt.show()
import pandas as pd
import numpy as np

# Load the dataset
df = pd.read_csv("nba_players_shooting.csv")

# Assuming 'X' and 'Y' are the coordinates of the shot, calculate the distance from the basket
# The basket is considered to be at the origin (0,0)
df['DISTANCE'] = np.sqrt(df['X']**2 + df['Y']**2)

# Bin the distances into ranges (e.g., 0-4 ft, 4-8 ft, etc.)
# Adjust bins based on your dataset's distance range and desired granularity
bins = [0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40]
df['DISTANCE_RANGE'] = pd.cut(df['DISTANCE'], bins=bins)

# Calculate shooting efficiency within each distance range
# First, convert 'SCORE' to numerical values: 1 for made shots, 0 for missed shots
df['SHOT_MADE'] = df['SCORE'].apply(lambda x: 1 if x == 'MADE' else 0)

# Group by 'DISTANCE_RANGE' and calculate efficiency
efficiency_by_range = df.groupby('DISTANCE_RANGE')['SHOT_MADE'].agg(['mean', 'count']).rename(columns={'mean': 'EFFICIENCY', 'count': 'ATTEMPTS'})

# Display the efficiency and attempts for each distance range
print(efficiency_by_range)

what we don't know is where the defender is in each of the shots, which can be a crucial difference maker if the shooter is wide open as opposed to having to release quick enough to just get over an outstretched hand.