Skip to content

NBA Shooting Data

This dataset contains shooting statistics for four different players during the 2021 NBA Playoffs.

Not sure where to begin? Scroll to the bottom to find challenges!

import pandas as pd

pd.read_csv("nba_players_shooting.csv", index_col=0)

Data Dictionary

variableclassdescription
SHOOTERStringName of the player taking the shot
XfloatHorizontal distance of the shot taken from the basket in ft
YfloatVertical distance of the shot taken from the basket in ft
RANGEStringRadius range of the shot taken from the basket in ft
DEFENDERStringName of the player defending the shot
SCOREString'MADE' if shot is scored, else 'MISSED'

Source of dataset.

Don't know where to start?

Challenges are brief tasks designed to help you practice specific skills:

  • ๐Ÿ—บ๏ธ Explore: At what range is each player most likely to score a shot?
  • ๐Ÿ“Š Visualize: Plot the shots made by the X and Y position on the court. For each shot, differentiate between the four different players.
  • ๐Ÿ”Ž Analyze: Are players more likely to score a shot the closer they get to the basket?

Scenarios are broader questions to help you develop an end-to-end project for your portfolio:

A university basketball team has hired you to use data to improve their performance. They want to know whether it's possible to use past data to provide tailored recommendations to players.

As a test, you have been provided with NBA shooting data for four players. The manager of the university team has asked you whether it is possible to provide data-driven recommendations for each player based on their likelihood of making a shot. You must also include how reliable your findings are, as well as advice for each player.

You will need to prepare a report that is accessible to a broad audience. It will need to outline your steps, findings, and conclusions.

What does this Data Set Cover?

Which player was the most accurate shooter? To answer this question, you could calculate the shooting percentage for each player?

How does shot accuracy vary by range? To answer this question, you could group shots into different radius ranges (e.g. 0-5 feet, 5-10 feet, etc.) and calculate the shooting percentage for each range. This could give you a sense of how players' shooting accuracy varies as they move farther away from the basket.

How does the quality of the defender affect shooting accuracy? To answer this question, you could calculate shooting percentages for each player based on who was defending them (e.g. shooting percentages against the team's best defender versus the worst defender). This could give you a sense of whether some players are more affected by defensive pressure than others.

How does the distribution of shot locations differ across players? To answer this question, you could create a heatmap or density plot of shot locations for each player. This could give you a sense of whether some players tend to shoot from certain areas of the court more than others.

How do the shooting patterns of each player compare to league averages? To answer this question, you could calculate league-wide shooting percentages for each radius range and compare them to the shooting percentages for each player. This could give you a sense of which players are above or below average in terms of shooting accuracy.

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv("nba_players_shooting.csv", index_col=0)
made_shots = df[df['SCORE'] == 'MADE']
sns.scatterplot(x='X', y='Y', hue='SHOOTER', data=made_shots)
plt.legend(loc='upper right')
plt.show()
for player in df['SHOOTER'].unique():
    player_shots = made_shots[made_shots['SHOOTER'] == player]
    plt.hist(player_shots['RANGE'], alpha=0.5, label=player)
plt.legend(loc='upper right')
plt.show()
sns.scatterplot(data=df, x='X', y='Y', hue='SCORE')
sns.regplot(data=df, x='X', y='Y', scatter=False)

plt.show()
sns.regplot(data=made_shots, x='X', y='Y', scatter=False)

plt.show()
โ€Œ
โ€Œ
โ€Œ