📖 Background
In the mystical land of Arcadia, where pixels and bits weave the fabric of reality, the ancient and revered Valut of Classics has begun to fade into obscurity. This vault, a legendary archive that houses the most iconic video games from 1980 to 2020, is threatened to be lost forever. Without intervention, the stories and legacies of these timeless classics may be forgotten.
You are a brave Data Sorcerer summoned by the Keepers of the vault. Your mission is not just a task, but an exhilarating adventure. You will delve into the vault, uncover the secrets of these legendary titles, and breathe new life into their stories. Along the way, you will face a series of challenges designed to test your skills in data analysis, visualization, and storytelling. Get ready for an epic journey! You must:
- Map the classics
- Race through time
This image was generated with an AI tool.
Summary of Video Game Analysis
This analysis encompasses two distinct challenges aimed at uncovering insights into the video game industry, specifically focusing on genre distribution and sales performance over four decades.
- Challenge 1: The Genre and Platform Expedition
In the first challenge, we explored the rich tapestry of video game genres and Publishers that hosted them from 1980 to 2020. This investigation involved:
We aggregated data on various video games, categorizing them into distinct genres such as Action, Adventure, Role-Playing, Simulation, Sports, and more. Through various visualizations, we illustrated the popularity trends of different genres across time, highlighting shifts in consumer preferences and technological advancements.
- Challenge 2: The Racing Bar Chart
This challenge involved creating an animated bar chart race to showcase the top-selling video games of all time. We compiled sales data from various regions (North America, Europe, Japan, and others) and calculated total global sales. Using the bar chart race technique, we visualized how game rankings evolved over time, highlighting the rise and fall of top-sellers. The final animation offered an engaging view of market shifts, revealing trends and the lasting impact of popular gaming franchises.
# To create animated bar chart races, we need to install the bar_chart_race library.
!pip install bar_chart_race
# pandas: for data manipulation and analysis, especially with tabular data (DataFrames).
import pandas as pd
# numpy: for numerical operations on arrays and matrices.
import numpy as np
# matplotlib.pyplot: for creating static, animated, and interactive visualizations.
import matplotlib.pyplot as plt
# seaborn: built on top of matplotlib, seaborn is used for statistical visualizations.
import seaborn as sns
# scipy.stats: skew calculates skewness (asymmetry) of data, trim_mean for trimmed mean (robust mean calculation), mstats for masked arrays.
from scipy.stats import skew
from scipy.stats import trim_mean, mstats
# matplotlib.ticker: for controlling tick marks and labels on plots.
import matplotlib.ticker as ticker
# matplotlib.animation: allows creation of animations in matplotlib, such as bar chart races.
import matplotlib.animation as animation
# IPython.display.HTML: used to display animations and other rich content like HTML in Jupyter notebooks.
from IPython.display import HTML
💾 The data
Columns | Description |
---|---|
Rank | Ranking of overall sales |
Name | Name of the game |
Platform | Platform of the games release (Wii, DS, PS3, etc.) |
Year | Release year |
Genre | Category of the game |
Publisher | who developed it (i.e. Nintento,Microsoft Games Studio, etc.) |
NA_Sales | Sales in North America (in millions) |
EU_Sales | Sales in Japan (in millions) |
JP_Sales | Sales in Japan (in millions) |
Other_Sales | Sales in the rest of the world (in millions |
Global_Sales | Total worldwide sales |
import pandas as pd
games = pd.read_csv('./data/vgsales.csv')
games.head()
💪 Challenge
Challenge 1: The Genre and Platform Expedition
- Investigate and visualize the distribution of video game genres and teams behind them from 1980 to 2020.
Challenge 2: The Racing Bar Chart Extravaganza
- Craft the ultimate bar chart race visual that crowns the top-selling video games of all time.
Analyze and Visualize the Distribution of Video Game Genres and Development Teams from 1980 to 2020
# Count the number of non-null entries in each column of the DataFrame 'games'
games.count()
# Display the data types of each column in the DataFrame 'games'
print(games.dtypes)
# Count the number of missing (NaN) values in each column of the DataFrame 'games'
games.isna().sum()
# Generate descriptive statistics for the 'Year' column in the DataFrame 'games'
games["Year"].describe()
# Check for missing values in the Publisher column
missing_count = games['Publisher'].isna().sum()
print(f"Number of missing values in 'Publisher': {missing_count}")
# Drop rows with missing Publisher
games.dropna(subset=['Publisher'], inplace=True)
# Check if there are still missing values
missing_values_count_after = games['Publisher'].isna().sum()
print(f"Number of missing values in 'Publisher' after filling: {missing_values_count_after}")
# Create bins for the Year column
bins = [1980, 1990, 2000, 2010, 2020]
labels = ['1980s', '1990s', '2000s', '2010s']
games['Year_binned'] = pd.cut(games['Year'], bins=bins, labels=labels, right=False)
# Count the number of games released in each decade
binned_counts = games['Year_binned'].value_counts()
# Plot the binned distribution
plt.figure(figsize=(8, 5))
binned_counts.plot(kind='bar', color='lightblue')
plt.title('Number of Video Games Released by Decade')
plt.xlabel('Decade')
plt.ylabel('Number of Games Released')
plt.xticks(rotation=45)
plt.show()