Skip to content
0

Which board game should you play?

๐Ÿ“– Background

After a tiring week, what better way to unwind than a board game night with friends and family? But the question remains: which game should you pick? You have gathered a dataset containing information of over 20,000 board games. It's time to utilize your analytical skills and use data-driven insights to persuade your group to try the game you've chosen!

๐Ÿ’พ The Data

You've come across a dataset titled bgg_data.csv containing details on over 20,000 ranked board games from the BoardGameGeek (BGG) website. BGG is the premier online hub for board game enthusiasts, hosting data on more than 100,000 games, inclusive of both ranked and unranked varieties. This platform thrives due to its active community, who contribute by posting reviews, ratings, images, videos, session reports, and participating in live discussions.

This specific dataset, assembled in February 2021, encompasses all ranked games listed on BGG up to that date. Games without a ranking were left out because they didn't garner enough reviews; for a game to earn a rank, it needs a minimum of 30 votes.

In this dataset, each row denotes a board game and is associated with some information.

ColumnDescription
IDThe ID of the board game.
NameThe name of the board game.
Year PublishedThe year when the game was published.
Min PlayersThe minimum number of player recommended for the game.
Max PlayersThe maximum number of player recommended for the game.
Play TimeThe average play time suggested by game creators, measured in minutes.
Min AgeThe recommended minimum age of players.
Users RatedThe number of users who rated the game.
Rating AverageThe average rating of the game, on a scale of 1 to 10.
BGG RankThe rank of the game on the BoardGameGeek (BGG) website.
Complexity AverageThe average complexity value of the game, on a scale of 1 to 5.
Owned UsersThe number of BGG registered owners of the game.
MechanicsThe mechanics used by the game.
DomainsThe board game domains that the game belongs to.

Source: Dilini Samarasinghe, July 5, 2021, "BoardGameGeek Dataset on Board Games", IEEE Dataport, doi: https://dx.doi.org/10.21227/9g61-bs59.

import pandas as pd
import numpy as np
import seaborn as sns
boardgame = pd.read_csv('data/bgg_data.csv')
boardgame

๐Ÿ’ช Challenge

Explore and analyze the board game data, and share the intriguing insights with your friends through a report. Here are some steps that might help you get started:

  • Is this dataset ready for analysis? Some variables have inappropriate data types, and there are outliers and missing values. Apply data cleaning techniques to preprocess the dataset.
  • Use data visualization techniques to draw further insights from the dataset.
  • Find out if the number of players impacts the game's average rating.

๐Ÿง‘โ€โš–๏ธ Judging criteria

This is a community-based competition. The top 5 most upvoted entries will win.

The winners will receive DataCamp merchandise.

โœ… Checklist before publishing into the competition

  • Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
  • Remove redundant cells like the judging criteria, so the workbook is focused on your story.
  • Make sure the workbook reads well and explains how you found your insights.
  • Try to include an executive summary of your recommendations at the beginning.
  • Check that all the cells run without error.

โŒ›๏ธ Time is ticking. Good luck!

# Data Verification, cleaning and manipulation:Is this dataset ready for analysis? Some variables have inappropriate data types, and there are outliers and missing values. Apply data cleaning techniques to preprocess the dataset.

## 1.Checking for missing values and duplicates
#print(boardgame.isna().sum())
boardgame_dropna=boardgame.dropna(subset='Owned Users')
print(boardgame_dropna.isna().sum())
x=boardgame_dropna[boardgame_dropna.duplicated()].shape[0]
print(f'Number of duplicate rows:{x}')

## 2.Checking for data types
print(boardgame.info())
### 2.1Checking Year published to ensure all entries are before Feb 2021
boardgame_dropna['Year Published'].max()
boardgame_dropna_year=boardgame_dropna[boardgame_dropna['Year Published']<=2021]
x=boardgame_dropna_year['Year Published'].max()
print(f'latest game published in {x}')


### 2.2 Drop meaningless entries
bg_drnayp=boardgame_dropna_year[boardgame_dropna_year['Max Players']>=1]
x=bg_drnayp['Max Players'].min()
print(f'max players number check: {x}')

##check for max players and min players for each game
x=bg_drnayp['Max Players']-bg_drnayp['Min Players']
print(f'players inequality check {np.sum(x<0)}')

df=bg_drnayp.dropna(subset='Domains')
#print(df.isna().sum())
print(bg_drnayp.describe())
## 3. Data manipulation
### 3.1 Splitting Domains by coma, and convert into a dictionary/list of domains
Domains=boardgame_dropna_year['Domains'].str.split(",",expand=True,)
Domains.columns=['Main','Sub1','Sub2']

### 3.2 Creating game type
###checking the range of min players and max players
players=[bg_drnayp['Min Players'],bg_drnayp['Max Players']]
for col in players:
    x=col.min()
    y=col.max()
    print( f'{col.name} has a range of {x,y}')
   
#### Based on the ranges of min and max players, add a small/ less than 10 players, medium/ -50 players, and large/ more than 50 players
gamesize=['small','medium','large']
trial=bg_drnayp.copy()
trial['game size']=pd.cut(trial['Max Players'],[0,10,50,1000],labels=gamesize)
print(trial['game size'].describe())

1.Domain

1.1 which domain has the highest rating

1.2 which domain has the most users

1.3 which domain has the most avery playtime, i.e avergae played time per game

1.4

Whcih domain is the most popular each age group

2. Time

2.1 Which year has the most playtime, percentage of each category

2.2 Which age group spent most time playing and per player

2.3

3. Players

3.1 Which age group has the most player

3.2 which age group has the longest playtime and per player

3.3 Does stricter age limit affects playtime?

4. Rating

4.1 which domain has the best rating

4.2 rating tendency across age group

4.3 relation between player number vs rating

Whether the number of players impacts the game's average rating

5. Game type

5.1 which game type has the most playtime

5.2 which game type has the best rating

5.3 which game type has the most player

5.4 what're the most popular domain for each gamae type