Video games are big business: the global gaming market is projected to be worth more than $300 billion by 2027 according to Mordor Intelligence. With so much money at stake, the major game publishers are hugely incentivized to create the next big hit. But are games getting better, or has the golden age of video games already passed?
In this project, you'll analyze video game critic and user scores as well as sales data for the top 400 video games released since 1977. You'll search for a golden age of video games by identifying release years that users and critics liked best, and you'll explore the business side of gaming by looking at game sales data.
Your search will involve joining datasets and comparing results with set theory. You'll also filter, group, and order data. Make sure you brush up on these skills before trying this project! The database contains two tables. Each table has been limited to 400 rows for this project, but you can find the complete dataset with over 13,000 games on Kaggle.
game_sales table
game_sales table| Column | Definition | Data Type |
|---|---|---|
| name | Name of the video game | varchar |
| platform | Gaming platform | varchar |
| publisher | Game publisher | varchar |
| developer | Game developer | varchar |
| games_sold | Number of copies sold (millions) | float |
| year | Release year | int |
reviews table
reviews table| Column | Definition | Data Type |
|---|---|---|
| name | Name of the video game | varchar |
| critic_score | Critic score according to Metacritic | float |
| user_score | User score according to Metacritic | float |
users_avg_year_rating table
users_avg_year_rating table| Column | Definition | Data Type |
|---|---|---|
| year | Release year of the games reviewed | int |
| num_games | Number of games released that year | int |
| avg_user_score | Average score of all the games ratings for the year | float |
critics_avg_year_rating table
critics_avg_year_rating table| Column | Definition | Data Type |
|---|---|---|
| year | Release year of the games reviewed | int |
| num_games | Number of games released that year | int |
| avg_critic_score | Average score of all the games ratings for the year | float |
-- best_selling_games
SELECT *
FROM game_sales
ORDER BY games_sold DESC
LIMIT 10; -- critics_top_ten_years
SELECT year, num_games, ROUND(avg_critic_score,2) AS avg_critic_score
FROM critics_avg_year_rating
WHERE num_games > 4
ORDER BY avg_critic_score DESC
LIMIT 10;
-- golden_years
SELECT critics.year,
critics.num_games,
critics.avg_critic_score,
users.avg_user_score,
critics.avg_critic_score - users.avg_user_score AS diff
FROM critics_avg_year_rating AS critics
INNER JOIN users_avg_year_rating AS users
ON critics.year = users.year
WHERE users.avg_user_score > 9 OR critics.avg_critic_score > 9
ORDER BY critics.year;Who are the developers that made the top 10 games?
This question is important because it can dictate if there are any developers that create top games. These developers would be highly sought after to develop games. The criteria for being considered a top 10 game though can depend on how it is being ranked (# of games sold, critic score or user score).
Ranking the data by number of games sold, critic score, and user score will display different results. Knowing how the data will be ranked is an important decision because the results could be used later on to determine if the top ranked games have a preferred platform or publisher besdies the preferred developer. Also, the data could then display what year the top games were released to determine how recent the top games were created.
SELECT
game_sales.name,
game_sales.developer,
game_sales.games_sold,
reviews.critic_score,
reviews.user_score,
RANK() OVER(ORDER BY game_sales.games_sold DESC) AS games_ranked
FROM game_sales
INNER JOIN reviews
ON game_sales.name = reviews.name
LIMIT 10;SELECT
game_sales.name,
game_sales.developer,
game_sales.games_sold,
reviews.critic_score,
reviews.user_score,
RANK() OVER(ORDER BY reviews.critic_score DESC) AS ranked_critic_score
FROM game_sales
INNER JOIN reviews
ON game_sales.name = reviews.name
LIMIT 10;SELECT
game_sales.name,
game_sales.developer,
game_sales.games_sold,
reviews.critic_score,
reviews.user_score,
RANK() OVER(ORDER BY reviews.user_score DESC) AS ranked_user_score
FROM game_sales
INNER JOIN reviews
ON game_sales.name = reviews.name
WHERE reviews.user_score IS NOT NULL
LIMIT 10;