Which board game should you play?
๐ Background
After a tiring week, what better way to unwind than a board game night with friends and family? But the question remains: which game should you pick? You have gathered a dataset containing information of over 20,000 board games. It's time to utilize your analytical skills and use data-driven insights to persuade your group to try the game you've chosen!
๐ Data Summary
The number of players (minimum and maximum) presents a very small negative correlation with the average rating of the game. The average rating of games, goes up with most of the presented features, we see a slight positive correlation between average rating against play time, minimum age, complexity average and users that owned users. A few key insights, from 1970 to 2022 play time has slightly decreased, while the users rating the games, the average rating and the users that own games ahve gone up. Games after 2020 have a higher Play Time (with high peaks), higher minimum age to play and a high average rating at the cost of highly decrease in complexity average.
๐พ The Data
You've come across a dataset titled bgg_data.csv containing details on over 20,000 ranked board games from the BoardGameGeek (BGG) website. BGG is the premier online hub for board game enthusiasts, hosting data on more than 100,000 games, inclusive of both ranked and unranked varieties. This platform thrives due to its active community, who contribute by posting reviews, ratings, images, videos, session reports, and participating in live discussions.
This specific dataset, assembled in February 2021, encompasses all ranked games listed on BGG up to that date. Games without a ranking were left out because they didn't garner enough reviews; for a game to earn a rank, it needs a minimum of 30 votes.
In this dataset, each row denotes a board game and is associated with some information.
| Column | Description |
|---|---|
ID | The ID of the board game. |
Name | The name of the board game. |
Year Published | The year when the game was published. |
Min Players | The minimum number of player recommended for the game. |
Max Players | The maximum number of player recommended for the game. |
Play Time | The average play time suggested by game creators, measured in minutes. |
Min Age | The recommended minimum age of players. |
Users Rated | The number of users who rated the game. |
Rating Average | The average rating of the game, on a scale of 1 to 10. |
BGG Rank | The rank of the game on the BoardGameGeek (BGG) website. |
Complexity Average | The average complexity value of the game, on a scale of 1 to 5. |
Owned Users | The number of BGG registered owners of the game. |
Mechanics | The mechanics used by the game. |
Domains | The board game domains that the game belongs to. |
Source: Dilini Samarasinghe, July 5, 2021, "BoardGameGeek Dataset on Board Games", IEEE Dataport, doi: https://dx.doi.org/10.21227/9g61-bs59.
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
sns.set_palette('crest')# Load in and preview a random sample of the data
games = pd.read_csv('data/bgg_data.csv')
games.sample(15, random_state = 45)Data preprocessing
# Check for null values
games.isnull().sum()# Drop the observations with null values from the columns ID, Year Published, Owned Users
games = games.dropna(subset = ['ID', 'Year Published', 'Owned Users'])
# Verify the null values remaining
games.isnull().sum()# Check the columns data types
games.dtypes# Change the data type of the columns: ID, Year Published, Owned Users from float to int
games['ID'] = games['ID'].astype('int')
games['Year Published'] = games['Year Published'].astype('int')
games['Owned Users'] = games['Owned Users'].astype('int')
# Verify the columns data types
games.dtypes# Check summary data statistics for the dataframe
games.describe()# Data with possible errors
games[games['Year Published'] <= 0]Observation
The observations in Year Published that have negative values are not errors, but instead they are BC years that tie to very old games and will be kept in the dataset.
The games with the year 0 as their observation look like games with missing publishing years. We will be dropping these observations.
# Drop the observations where Year Published is 0
games = games[games['Year Published'] != 0]โ
โ