Skip to content
0

Which board game should you play?

๐Ÿ“– Background

After a tiring week, what better way to unwind than a board game night with friends and family? But the question remains: which game should you pick? You have gathered a dataset containing information of over 20,000 board games. It's time to utilize your analytical skills and use data-driven insights to persuade your group to try the game you've chosen!

๐Ÿ’พ The Data

You've come across a dataset titled bgg_data.csv containing details on over 20,000 ranked board games from the BoardGameGeek (BGG) website. BGG is the premier online hub for board game enthusiasts, hosting data on more than 100,000 games, inclusive of both ranked and unranked varieties. This platform thrives due to its active community, who contribute by posting reviews, ratings, images, videos, session reports, and participating in live discussions.

This specific dataset, assembled in February 2021, encompasses all ranked games listed on BGG up to that date. Games without a ranking were left out because they didn't garner enough reviews; for a game to earn a rank, it needs a minimum of 30 votes.

In this dataset, each row denotes a board game and is associated with some information.

ColumnDescription
IDThe ID of the board game.
NameThe name of the board game.
Year PublishedThe year when the game was published.
Min PlayersThe minimum number of player recommended for the game.
Max PlayersThe maximum number of player recommended for the game.
Play TimeThe average play time suggested by game creators, measured in minutes.
Min AgeThe recommended minimum age of players.
Users RatedThe number of users who rated the game.
Rating AverageThe average rating of the game, on a scale of 1 to 10.
BGG RankThe rank of the game on the BoardGameGeek (BGG) website.
Complexity AverageThe average complexity value of the game, on a scale of 1 to 5.
Owned UsersThe number of BGG registered owners of the game.
MechanicsThe mechanics used by the game.
DomainsThe board game domains that the game belongs to.

Source: Dilini Samarasinghe, July 5, 2021, "BoardGameGeek Dataset on Board Games", IEEE Dataport, doi: https://dx.doi.org/10.21227/9g61-bs59.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
boardgame = pd.read_csv('data/bgg_data.csv')
boardgame

Data Cleaning Procces

Meaning of variables and their type

def boardgame_info():
    temp = pd.DataFrame(index=boardgame.columns)
    temp["Datatype"] = boardgame.dtypes
    temp["Not null values"] = boardgame.count()
    temp["Null Values"] = boardgame.isnull().sum()
    temp["Percentage of Null Values"] = (boardgame.isnull().mean())*100
    temp["Unique count"] = boardgame.nunique()
    return temp
boardgame_info()

Solving the id problem

# Creating a sequence of numbers to substitute the id
sequence_of_numbers = list(range(1, len(boardgame) + 1))
boardgame['ID'] = sequence_of_numbers

Finding the null Year

# Find the rows with null values in the `Year Published` column
null_values_mask = boardgame['Year Published'].isnull()
null_values_rows = boardgame[null_values_mask]

# Print the rows with null values in the `Year Published` column
null_values_rows
boardgame.loc[boardgame['ID'] == 13985, 'Year Published'] = 1855
#cleaned data

def boardgame_info():
    temp = pd.DataFrame(index=boardgame.columns)
    temp["Datatype"] = boardgame.dtypes
    temp["Not null values"] = boardgame.count()
    temp["Null Values"] = boardgame.isnull().sum()
    temp["Percentage of Null Values"] = (boardgame.isnull().mean())*100
    temp["Unique count"] = boardgame.nunique()
    return temp
boardgame_info()

Creating new columns for better analysis

Solely relying on Rating Average is insufficient for our analysis, as a game could have a Rating Average of 9 with only 10 Users Rated. This average is not entirely representative of the game's overall quality. To address this limitation, I introduced the Weighted Rating column, which prioritizes games with high ratings from a larger number of users.

โ€Œ
โ€Œ
โ€Œ