Skip to content

Video Games Sales Data

This dataset contains records of popular video games in North America, Japan, Europe and other parts of the world. Every video game in this dataset has at least 100k global sales. Not sure where to begin? Scroll to the bottom to find challenges!

import pandas as pd
sales = pd.read_csv("vgsales.csv", index_col=0)
print(sales.shape)
sales.head(100)

Data Dictionary

ColumnExplanation
RankRanking of overall sales
NameName of the game
PlatformPlatform of the games release (i.e. PC,PS4, etc.)
YearYear the game was released in
GenreGenre of the game
PublisherPublisher of the game
NA_SalesNumber of sales in North America (in millions)
EU_SalesNumber of sales in Europe (in millions)
JP_SalesNumber of sales in Japan (in millions)
Other_SalesNumber of sales in other parts of the world (in millions)
Global_SalesNumber of total sales (in millions)

Source of dataset.

Don't know where to start?

Challenges are brief tasks designed to help you practice specific skills:

  • ๐Ÿ—บ๏ธ Explore: Which of the three seventh generation consoles (Xbox 360, Playstation 3, and Nintendo Wii) had the highest total sales globally?
  • ๐Ÿ“Š Visualize: Create a plot visualizing the average sales for games in the most popular three genres. Differentiate between NA, EU, and global sales.
  • ๐Ÿ”Ž Analyze: Are some genres significantly more likely to perform better or worse in Japan than others? If so, which ones?

Scenarios are broader questions to help you develop an end-to-end project for your portfolio:

You are working as a data analyst for a video game retailer based in Japan. The retailer typically orders games based on sales in North America and Europe, as the games are often released later in Japan. However, they have found that North American and European sales are not always a perfect predictor of how a game will sell in Japan.

Your manager has asked you to develop a model that can predict the sales in Japan using sales in North America and Europe and other attributes such as the name of the game, the platform, the genre, and the publisher.

You will need to prepare a report that is accessible to a broad audience. It should outline your motivation, steps, findings, and conclusions.

Which of the three seventh generation consoles (Xbox 360, Playstation 3, and Nintendo Wii) had the highest total sales globally?

import pandas as pd
import matplotlib.pyplot as plt

# Load the CSV data
sales = pd.read_csv('vgsales.csv')

# Filter for 7th generation consoles
gen7 = sales[sales['Platform'].isin(['X360', 'PS3', 'Wii'])]

# Group by platform and sum Global Sales
total_sales = gen7.groupby('Platform')['Global_Sales'].sum()

# Plot the results
plt.figure(figsize=(8, 5))
total_sales.sort_values().plot(kind='barh', color='skyblue')

# Highlight the top-selling console
top_selling_console = total_sales.idxmax()
top_sales = total_sales.max()
plt.title('Global Sales by Console (7th Generation)')
plt.xlabel('Global Sales (Millions)')
plt.axvline(top_sales, color='green', linestyle='--', linewidth=1)
plt.text(top_sales + 0.5, total_sales.index.get_loc(top_selling_console), 
         f'Top: {top_selling_console}', color='green', va='center')

plt.tight_layout()
plt.show()

The graph clearly shows that the Xbox 360 achieved the highest global sales among seventh-generation consoles, surpassing 900 million units sold.

Create a plot visualizing the average sales for games in the most popular three genres. Differentiate between NA, EU, and global sales.

import pandas as pd
import matplotlib.pyplot as plt

# Assuming the data is already loaded into the `sales` DataFrame
# sales = pd.read_csv('vg_sales.csv')  # Uncomment if you're loading from CSV

# Step 1: Find the top 3 most common genres (by number of games)
top_genres = sales['Genre'].value_counts().nlargest(3).index.tolist()

# Step 2: Filter for those genres
filtered = sales[sales['Genre'].isin(top_genres)]

# Step 3: Group by genre and calculate average sales
avg_sales = filtered.groupby('Genre')[['NA_Sales', 'EU_Sales', 'Global_Sales']].mean()

# Step 4: Plot
avg_sales.plot(kind='bar', figsize=(10, 6), colormap='viridis')

plt.title('Average Sales by Genre (Top 3 Genres)')
plt.xlabel('Genre')
plt.ylabel('Average Sales (millions)')
plt.legend(title='Region')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

Key insights show that global sales are consistently the highest across all genres. North America has higher average sales than Europe in every genre. Sports leads in average global sales, with Action following closely behind. Miscellaneous genres trail both Action and Sports across all regions. In conclusion, Action and Sports are the strongest performing genres globally and across major markets, with North America consistently outperforming Europe in all three genres.

Are some genres significantly more likely to perform better or worse in Japan than others? If so, which ones?

import pandas as pd
import matplotlib.pyplot as plt

# Assuming the data is already loaded into the `sales` DataFrame
# sales = pd.read_csv('vg_sales.csv')  # Uncomment if loading from CSV

# Step 1: Group by Genre and calculate the average JP_Sales
avg_jp_sales = sales.groupby('Genre')['JP_Sales'].mean()

# Step 2: Sort by JP_Sales to see which genres perform better or worse
avg_jp_sales_sorted = avg_jp_sales.sort_values(ascending=False)

# Step 3: Plot the results
avg_jp_sales_sorted.plot(kind='bar', figsize=(12, 6), color='skyblue')

plt.title('Average Japanese Sales by Genre')
plt.xlabel('Genre')
plt.ylabel('Average Japanese Sales (millions)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

# Print out the sorted values to view top and bottom genres
print(avg_jp_sales_sorted)

Role-Playing games are the most popular in Japan, with the highest average sales. Genres such as Platform, Fighting, and Puzzle also perform well. In contrast, Shooter and Adventure games have the lowest average sales and are less popular among Japanese gamers.

The key point is that Japanese players tend to favor Role-Playing and Platform games, while genres like Shooter are not as well received.

Python Steps for Predicting Japanese Sales

โ€Œ
โ€Œ
โ€Œ