Which board game should you play?
๐ Background
After a tiring week, what better way to unwind than a board game night with friends and family? But the question remains: which game should you pick? You have gathered a dataset containing information of over 20,000 board games. It's time to utilize your analytical skills and use data-driven insights to persuade your group to try the game you've chosen!
๐พ The Data
You've come across a dataset titled bgg_data.csv containing details on over 20,000 ranked board games from the BoardGameGeek (BGG) website. BGG is the premier online hub for board game enthusiasts, hosting data on more than 100,000 games, inclusive of both ranked and unranked varieties. This platform thrives due to its active community, who contribute by posting reviews, ratings, images, videos, session reports, and participating in live discussions.
This specific dataset, assembled in February 2021, encompasses all ranked games listed on BGG up to that date. Games without a ranking were left out because they didn't garner enough reviews; for a game to earn a rank, it needs a minimum of 30 votes.
In this dataset, each row denotes a board game and is associated with some information.
| Column | Description |
|---|---|
ID | The ID of the board game. |
Name | The name of the board game. |
Year Published | The year when the game was published. |
Min Players | The minimum number of player recommended for the game. |
Max Players | The maximum number of player recommended for the game. |
Play Time | The average play time suggested by game creators, measured in minutes. |
Min Age | The recommended minimum age of players. |
Users Rated | The number of users who rated the game. |
Rating Average | The average rating of the game, on a scale of 1 to 10. |
BGG Rank | The rank of the game on the BoardGameGeek (BGG) website. |
Complexity Average | The average complexity value of the game, on a scale of 1 to 5. |
Owned Users | The number of BGG registered owners of the game. |
Mechanics | The mechanics used by the game. |
Domains | The board game domains that the game belongs to. |
Source: Dilini Samarasinghe, July 5, 2021, "BoardGameGeek Dataset on Board Games", IEEE Dataport, doi: https://dx.doi.org/10.21227/9g61-bs59.
import pandas as pd
boardgame = pd.read_csv('data/bgg_data.csv')
boardgame4 hidden cells
Selecting a board game for an evening with friends and family, I've established guidelines to meet our specific preferences for the occasion:
- Playtime Limit: Maximum 240 minutes
- Minimum Players: 4
- Complexity Level: Between 1-3
- Average Rating Requirement: Above 6.5
- Consideration of Ratings: Factored with minimum 20,000 ratings
My analysis is grounded in these criterias.
import pandas as pd
boardgame_raw = pd.read_csv('data/bgg_data.csv')
boardgame_raw.info()
boardgame_raw.isna().sum()
boardgame.duplicated().sum()
boardgame_unfiltered = boardgame_raw.dropna()
boardgame_unfiltered['ID'] = boardgame_unfiltered['ID'].astype('int')
boardgame_unfiltered['Year Published'] = boardgame_unfiltered['Year Published'].astype('int')
boardgame_unfiltered['Owned Users'] = boardgame_unfiltered['Owned Users'].astype('int')
print(boardgame_unfiltered.dtypes)
print(boardgame_unfiltered.isna().sum())# Defining my criterias
max_play_time = 240
min_players = 4
min_avg_rating = 6.5
# Filtering the dataset after my criterias
boardgame_filtered = boardgame_unfiltered[
(boardgame_unfiltered['Play Time'] <= max_play_time) &
(boardgame_unfiltered['Min Players'] >= min_players) &
(boardgame_unfiltered['Rating Average'] >= min_avg_rating)
]
boardgame_filtered.head()import matplotlib.pyplot as plt
import seaborn as sns
sns.histplot(boardgame_filtered['Rating Average'], kde=True, bins=10)
plt.title('Distribution of Average Ratings')
plt.show()This examination visually depicts the spread of average ratings for board games. The histogram offers a condensed overview of how ratings are distributed across various intervals or bins. Notably, within the specified criteria, a prevalent concentration of ratings shows that most games falls between 6.5 and 7.5 in average.
# creating a scatterplot
plt.scatter(
boardgame_filtered['Play Time'],
boardgame_filtered['Complexity Average'],
c=boardgame_filtered['Rating Average'],
cmap='coolwarm',
alpha=0.7
)
plt.colorbar(label='Average Rating')
plt.xlabel('Play Time (minutes)')
plt.ylabel('Complexity Average')
plt.title('Play Time vs. Complexity')
# Adding a trendline
sns.regplot(
x='Play Time',
y='Complexity Average',
data=boardgame_filtered,
scatter=False,
color='grey',
line_kws={'linewidth': 2}
)
plt.show()This scatter plot compares play time and complexity for board games. Play Time ranges from 50 to 250 minutes (x-axis), while complexity ranges from 2 to 4.5 (y-axis). From this scatterplot, a small correlation can be seen between play time and complexity rating.
#creating a matrix
correlation_matrix = boardgame_filtered.corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix')
plt.show()The correlation matrix shows the relationships between various attributes. Some notable correlations include a positive relationship between Playtime and Complexity Average, and a negative relationship between year published and Rating Average. Overall, the matrix provides insights into the dependencies and patterns within my criterias.
โ
โ