Skip to content

Video games are big business: the global gaming market is projected to be worth more than $300 billion by 2027 according to Mordor Intelligence. With so much money at stake, the major game publishers are hugely incentivized to create the next big hit. But are games getting better, or has the golden age of video games already passed?

In this project, you'll analyze video game critic and user scores as well as sales data for the top 400 video games released since 1977. You'll search for a golden age of video games by identifying release years that users and critics liked best, and you'll explore the business side of gaming by looking at game sales data.

Your search will involve joining datasets and comparing results with set theory. You'll also filter, group, and order data. Make sure you brush up on these skills before trying this project! The database contains two tables. Each table has been limited to 400 rows for this project, but you can find the complete dataset with over 13,000 games on Kaggle.

game_sales table

ColumnDefinitionData Type
nameName of the video gamevarchar
platformGaming platformvarchar
publisherGame publishervarchar
developerGame developervarchar
games_soldNumber of copies sold (millions)float
yearRelease yearint

reviews table

ColumnDefinitionData Type
nameName of the video gamevarchar
critic_scoreCritic score according to Metacriticfloat
user_scoreUser score according to Metacriticfloat

users_avg_year_rating table

ColumnDefinitionData Type
yearRelease year of the games reviewedint
num_gamesNumber of games released that yearint
avg_user_scoreAverage score of all the games ratings for the yearfloat

critics_avg_year_rating table

ColumnDefinitionData Type
yearRelease year of the games reviewedint
num_gamesNumber of games released that yearint
avg_critic_scoreAverage score of all the games ratings for the yearfloat

Find the top ten best selling games of all time.

The output should contain all the columns in the game_sales table and be sorted by the games_sold column in descending order. Save the output as best_selling_games.

Build a query to return the best selling games as measured by games_sold.

Spinner
DataFrameas
best_selling_games
variable
-- best_selling_games
-- Select all information for the top ten best-selling games
-- Order the results from best-selling game down to tenth best-selling

SELECT * FROM game_sales
ORDER BY games_sold DESC
LIMIT 10;
Spinner
DataFrameas
df4
variable
SELECT * FROM game_sales
Spinner
DataFrameas
df7
variable
SELECT COUNT(*) FROM game_sales
Spinner
DataFrameas
df5
variable
SELECT * FROM reviews
Spinner
DataFrameas
df8
variable
SELECT COUNT(*) FROM reviews

Find the top ten years with the highest rated average from critics.

Find the ten years with the highest average critic score, where at least four games were released (to ensure a good sample size). Return an output with the columns year, num_games released, and avg_critic_score. The avg_critic_score should be rounded to 2 decimal places. The table should be ordered by avg_critic_score in descending order. Save the output as critics_top_ten_years.

Execute a query that finds the top ten years where games had the highest average rating from critics.

Spinner
DataFrameas
critics_top_ten_years
variable
-- critics_top_ten_years
-- Select release year and average critic score for each year, rounded and aliased
-- Join the game_sales and reviews tables
-- Group by release year
-- Order the data from highest to lowest avg_critic_score and limit to 10 results

SELECT g.year, COUNT(r.name) AS num_games, ROUND(AVG(critic_score), 2) AS avg_critic_score
FROM game_sales AS g
INNER JOIN reviews AS r
ON g.name = r.name
GROUP BY year
HAVING COUNT(r.name) > 4
ORDER BY avg_critic_score DESC
LIMIT 10
Spinner
DataFrameas
df1
variable
SELECT COUNT(*) FROM public.critics_avg_year_rating
Spinner
DataFrameas
df2
variable
SELECT * FROM public.critics_avg_year_rating
ORDER BY year
Spinner
DataFrameas
df
variable
SELECT COUNT(*) FROM public.users_avg_year_rating
Spinner
DataFrameas
df3
variable
SELECT * FROM public.users_avg_year_rating
ORDER BY year

Find the golden years of video games, based on the user and critic reviews.

Find the years where critics and users broadly agreed that games released were highly rated. Specifically, return the years where the average critic score was over 9 OR the average user score was over 9. The pre-computed average critic and user scores per year are stored in users_avg_year_rating and critics_avg_year_rating tables respectively. The query should return the following columns: year, num_games, avg_critic_score, avg_user_score, and diff. The diff column should be the difference between the avg_critic_score and avg_user_score, it should be a positive float. The table should be ordered by the diff column in ascending order, save this as a dataframe named golden_years.

Execute a query that finds all the games, which had an average rating from users or critics of over 9.