NBA Shooting Data
This dataset contains shooting statistics for four different players during the 2021 NBA Playoffs.
- Seth Curry
- Trae Young
- Chris Paul
- Russell Westbrook
In this project, we will compare the four players shooting profile and formulate a list of advices for each players to maximize player performance.
Data Dictionary
| variable | class | description |
|---|---|---|
| SHOOTER | String | Name of the player taking the shot |
| X | float | Horizontal distance of the shot taken from the basket in ft |
| Y | float | Vertical distance of the shot taken from the basket in ft |
| RANGE | String | Radius range of the shot taken from the basket in ft |
| DEFENDER | String | Name of the player defending the shot |
| SCORE | String | 'MADE' if shot is scored, else 'MISSED' |
Source of dataset.
The business case:
A university basketball team has hired you to use data to improve their performance. They want to know whether it's possible to use past data to provide tailored recommendations to players.
As a test, you have been provided with NBA shooting data for four players. The manager of the university team has asked you whether it is possible to provide data-driven recommendations for each player based on their likelihood of making a shot.
You will need to prepare a report that is accessible to a broad audience. It will need to outline your steps, findings, and conclusions.
I- Exploratory Data Analysis
First, let's explore the data and conduct initial vizualization to understand the data better.
1. Import the necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Assuming the dataset is in a CSV file named "shooting_data.csv"
data = pd.read_csv("nba_players_shooting.csv", index_col = 0)
2. Initial Data Exploration
In this step, we just start by printing the data to see the general layout.
# View the first few rows of the dataset
print("---------------------------------------------")
print("----------- Column Head----------------------")
print(data.head())
# Check the dataset's shape, columns, and data types
print("---------------------------------------------")
print("Data Shape is:")
print(data.shape)
print("---------------------------------------------")
print("Data Column titles are:")
print(data.columns)
print("---------------------------------------------")
print("Columns datatypes are:")
print(data.dtypes)
# Check for missing or null values
print("---------------------------------------------")
print("Missing values check.")
print(data.isnull().sum())
We can see that there are no missing values in this data, which means that we do not have to
3. Generate Summart Statistics
# Generate summary statistics for numerical variables
print(data.describe())
# Generate frequency counts for categorical variables
print(data['SHOOTER'].value_counts())
print(data['RANGE'].value_counts())
print(data['DEFENDER'].value_counts())
print(data['SCORE'].value_counts())