NBA Player Evolution: A Data-Driven Analysis from 1996 to 2023
Author: Kliz John Andrei Millares
Part I: NBA Players Records
1.1 Background
Basketball is a highly statistical sport, and analyzing player performance data can provide deep insights into players' strengths, weaknesses, and overall impact on the game. This analysis can help teams make informed decisions regarding player development, game strategies, and player acquisitions.
1.2 Objectives
The primary objectives of this analysis are:
- Player Performance Analysis: Evaluate the performance metrics of players across different seasons.
- Trend Analysis: Identify trends in player performance over their careers.
- Comparative Analysis: Compare player performance across different teams and demographics.
- Draft Analysis: Analyze the impact of draft details on player performance.
- Team Performance: Assess the contributions of individual players to team success.
- Impact of Player Attributes: Investigate the relationship between player attributes and performance.
1.3 Introduction
This EDA aims to provide a comprehensive analysis of basketball player data from various seasons. By examining key performance metrics and other relevant attributes, we can uncover patterns and insights that are valuable for players, teams, and analysts.
1.4 Data Description
The dataset contains information about basketball players across different seasons from Kaggle:
Here's the table including the data type information:
| Column Name | Description | Data Type |
|---|---|---|
| player_name | Name of the player | object |
| team_abbreviation | Team abbreviation | object |
| age | Age of the player | int64 |
| player_height | Height of the player (in cm) | float64 |
| player_weight | Weight of the player (in kg) | float64 |
| college | College the player attended | object |
| country | Country the player is from | object |
| draft_year | Year the player was drafted | int64 |
| draft_round | Round in which the player was drafted | int64 |
| draft_number | Overall pick number in the draft | int64 |
| gp | Games played | int64 |
| pts | Points per game | float64 |
| reb | Rebounds per game | float64 |
| ast | Assists per game | float64 |
| net_rating | Net rating | float64 |
| oreb_pct | Offensive rebound percentage | float64 |
| dreb_pct | Defensive rebound percentage | float64 |
| usg_pct | Usage percentage | float64 |
| ts_pct | True shooting percentage | float64 |
| ast_pct | Assist percentage | float64 |
| season | Season | object |
1.5 Exploratory Data Analysis (EDA)
Data Cleaning and Preparation
# Import necessary packages
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load the CSV file
file_path = 'data/all_seasons.csv'
data = pd.read_csv(file_path)
# Display the first few rows of the dataframe
data.head()# Step 1: Data Overview
# Check the basic information about the dataset
data_info = data.info()
# Check for missing values
missing_values = data.isnull().sum()
# Check for duplicates
duplicate_rows = data.duplicated().sum()
missing_values, duplicate_rows
A preliminary examination (Exploratory Data Analysis - EDA) reveals the basketball player dataset to be extensive, encompassing 12,844 individual entries distributed across 22 distinct columns. Each column represents a specific player attribute, providing a comprehensive profile for each entry.
-
Lots of details: It includes player names, team affiliations, age, height, weight, college attended, country of origin, draft details (year, round, number), game statistics (games played, points, rebounds, assists), performance metrics (net rating, offensive/defensive rebound percentage, usage percentage, true shooting percentage, assist percentage), and the season.
-
Not all info is there: Some information is missing for college attended (1,854 entries). However, other player attributes such as age, height, weight, and various game statistics are complete.
-
Reliable info for performance: Metrics such as games played, points scored, and rebounds are fully documented for all players, providing reliable insights into their performance.
-
Great for exploring: This dataset serves as a robust starting point for analyzing basketball player profiles and performance metrics, despite some missing college information.
# Computing descriptive statistics for the DataFrame
data.describe()Player Performance Summary:
| Metric | Mean | Std Dev | Min | Max |
|---|---|---|---|---|
| Games Played (gp) | 51.15 | 25.08 | 1 | 85 |
| Points per Game (pts) | 8.21 | 6.02 | 0 | 36.1 |
| Rebounds per Game (reb) | 3.56 | 2.48 | 0 | 16.3 |
| Assists per Game (ast) | 1.82 | 1.80 | 0 | 11.7 |
| Net Rating (net_rating) | -2.23 | 12.67 | -250 | 300 |
Insights from Player Performance Data:
-
Games Played: The dataset covers players who participated in an average of 51.15 games, with considerable variability indicated by a standard deviation of 25.08 games. The range extends from 1 game to a maximum of 85 games played.
-
Points per Game: Players averaged 8.21 points per game, with scores ranging from 0 to 36.1. The standard deviation of 6.02 points highlights the diversity in scoring abilities among players.
-
Rebounds per Game: On average, players secured 3.56 rebounds per game, with a standard deviation of 2.48. The range spans from no rebounds to a maximum of 16.3 rebounds per game.
-
Assists per Game: Players averaged 1.82 assists per game, with a standard deviation of 1.80. The dataset shows a range from no assists to a maximum of 11.7 assists per game.
-
Net Rating: The average net rating across players is -2.23, indicating that, on average, teams performed slightly worse when these players were on the court. The net rating varies widely, from a minimum of -250 to a maximum of 300, reflecting diverse team dynamics and player contributions.
1.6 Main Analysis
# Distribution of points per game
plt.figure(figsize=(10, 6))
sns.histplot(data['pts'], bins=30, kde=True)
plt.title('Distribution of Points per Game')
plt.xlabel('Points per Game')
plt.ylabel('Frequency')
plt.show()