Skip to content

Video Games Sales Data

This dataset contains records of popular video games in North America, Japan, Europe and other parts of the world. Every video game in this dataset has at least 100k global sales.

Not sure where to begin? Scroll to the bottom to find challenges!

import pandas as pd

pd.read_csv("vgsales.csv", index_col=0)

Data Dictionary

ColumnExplanation
RankRanking of overall sales
NameName of the game
PlatformPlatform of the games release (i.e. PC,PS4, etc.)
YearYear the game was released in
GenreGenre of the game
PublisherPublisher of the game
NA_SalesNumber of sales in North America (in millions)
EU_SalesNumber of sales in Europe (in millions)
JP_SalesNumber of sales in Japan (in millions)
Other_SalesNumber of sales in other parts of the world (in millions)
Global_SalesNumber of total sales (in millions)

Source of dataset.

Don't know where to start?

Challenges are brief tasks designed to help you practice specific skills:

  • 🗺️ Explore: Which of the three seventh generation consoles (Xbox 360, Playstation 3, and Nintendo Wii) had the highest total sales globally?
  • 📊 Visualize: Create a plot visualizing the average sales for games in the most popular three genres. Differentiate between NA, EU, and global sales.
  • 🔎 Analyze: Are some genres significantly more likely to perform better or worse in Japan than others? If so, which ones?

Scenarios are broader questions to help you develop an end-to-end project for your portfolio:

You are working as a data analyst for a video game retailer based in Japan. The retailer typically orders games based on sales in North America and Europe, as the games are often released later in Japan. However, they have found that North American and European sales are not always a perfect predictor of how a game will sell in Japan.

Your manager has asked you to develop a model that can predict the sales in Japan using sales in North America and Europe and other attributes such as the name of the game, the platform, the genre, and the publisher.

You will need to prepare a report that is accessible to a broad audience. It should outline your motivation, steps, findings, and conclusions.

import numpy as np

np.