Skip to content

NBA Player Evolution: A Data-Driven Analysis from 1996 to 2023

Author: Kliz John Andrei Millares

Part I: NBA Players Records

NBA Logo

1.1 Background

Basketball is a highly statistical sport, and analyzing player performance data can provide deep insights into players' strengths, weaknesses, and overall impact on the game. This analysis can help teams make informed decisions regarding player development, game strategies, and player acquisitions.

1.2 Objectives

The primary objectives of this analysis are:

  • Player Performance Analysis: Evaluate the performance metrics of players across different seasons.
  • Trend Analysis: Identify trends in player performance over their careers.
  • Comparative Analysis: Compare player performance across different teams and demographics.
  • Draft Analysis: Analyze the impact of draft details on player performance.
  • Team Performance: Assess the contributions of individual players to team success.
  • Impact of Player Attributes: Investigate the relationship between player attributes and performance.

1.3 Introduction

This EDA aims to provide a comprehensive analysis of basketball player data from various seasons. By examining key performance metrics and other relevant attributes, we can uncover patterns and insights that are valuable for players, teams, and analysts.

1.4 Data Description

The dataset contains information about basketball players across different seasons from Kaggle:

Here's the table including the data type information:

Column NameDescriptionData Type
player_nameName of the playerobject
team_abbreviationTeam abbreviationobject
ageAge of the playerint64
player_heightHeight of the player (in cm)float64
player_weightWeight of the player (in kg)float64
collegeCollege the player attendedobject
countryCountry the player is fromobject
draft_yearYear the player was draftedint64
draft_roundRound in which the player was draftedint64
draft_numberOverall pick number in the draftint64
gpGames playedint64
ptsPoints per gamefloat64
rebRebounds per gamefloat64
astAssists per gamefloat64
net_ratingNet ratingfloat64
oreb_pctOffensive rebound percentagefloat64
dreb_pctDefensive rebound percentagefloat64
usg_pctUsage percentagefloat64
ts_pctTrue shooting percentagefloat64
ast_pctAssist percentagefloat64
seasonSeasonobject

1.5 Exploratory Data Analysis (EDA)

Data Cleaning and Preparation

# Import necessary packages
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load the CSV file
file_path = 'data/all_seasons.csv'
data = pd.read_csv(file_path)

# Display the first few rows of the dataframe
data.head()
# Step 1: Data Overview

# Check the basic information about the dataset
data_info = data.info()

# Check for missing values
missing_values = data.isnull().sum()

# Check for duplicates
duplicate_rows = data.duplicated().sum()

missing_values, duplicate_rows

A preliminary examination (Exploratory Data Analysis - EDA) reveals the basketball player dataset to be extensive, encompassing 12,844 individual entries distributed across 22 distinct columns. Each column represents a specific player attribute, providing a comprehensive profile for each entry.

  • Lots of details: It includes player names, team affiliations, age, height, weight, college attended, country of origin, draft details (year, round, number), game statistics (games played, points, rebounds, assists), performance metrics (net rating, offensive/defensive rebound percentage, usage percentage, true shooting percentage, assist percentage), and the season.

  • Not all info is there: Some information is missing for college attended (1,854 entries). However, other player attributes such as age, height, weight, and various game statistics are complete.

  • Reliable info for performance: Metrics such as games played, points scored, and rebounds are fully documented for all players, providing reliable insights into their performance.

  • Great for exploring: This dataset serves as a robust starting point for analyzing basketball player profiles and performance metrics, despite some missing college information.

# Computing descriptive statistics for the DataFrame
data.describe()

Player Performance Summary:

MetricMeanStd DevMinMax
Games Played (gp)51.1525.08185
Points per Game (pts)8.216.02036.1
Rebounds per Game (reb)3.562.48016.3
Assists per Game (ast)1.821.80011.7
Net Rating (net_rating)-2.2312.67-250300

Insights from Player Performance Data:

  • Games Played: The dataset covers players who participated in an average of 51.15 games, with considerable variability indicated by a standard deviation of 25.08 games. The range extends from 1 game to a maximum of 85 games played.

  • Points per Game: Players averaged 8.21 points per game, with scores ranging from 0 to 36.1. The standard deviation of 6.02 points highlights the diversity in scoring abilities among players.

  • Rebounds per Game: On average, players secured 3.56 rebounds per game, with a standard deviation of 2.48. The range spans from no rebounds to a maximum of 16.3 rebounds per game.

  • Assists per Game: Players averaged 1.82 assists per game, with a standard deviation of 1.80. The dataset shows a range from no assists to a maximum of 11.7 assists per game.

  • Net Rating: The average net rating across players is -2.23, indicating that, on average, teams performed slightly worse when these players were on the court. The net rating varies widely, from a minimum of -250 to a maximum of 300, reflecting diverse team dynamics and player contributions.

1.6 Main Analysis

# Distribution of points per game
plt.figure(figsize=(10, 6))
sns.histplot(data['pts'], bins=30, kde=True)
plt.title('Distribution of Points per Game')
plt.xlabel('Points per Game')
plt.ylabel('Frequency')
plt.show()