Skip to content
Competition - Board Games
Which board game should you play?
π Background
After a tiring week, what better way to unwind than a board game night with friends and family? But the question remains: which game should you pick? You have gathered a dataset containing information of over 20,000 board games. It's time to utilize your analytical skills and use data-driven insights to persuade your group to try the game you've chosen!
import pandas as pd
boardgame = pd.read_csv('data/bgg_data.csv')
boardgameπͺ Challenge
Explore and analyze the board game data, and share the intriguing insights with your friends through a report. Here are some steps that might help you get started:
- Is this dataset ready for analysis? Some variables have inappropriate data types, and there are outliers and missing values. Apply data cleaning techniques to preprocess the dataset.
 - Use data visualization techniques to draw further insights from the dataset.
 - Find out if the number of players impacts the game's average rating.
 
π§ββοΈ Judging criteria
This is a community-based competition. The top 5 most upvoted entries will win.
The winners will receive DataCamp merchandise.
β
 Checklist before publishing into the competition
- Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
 - Remove redundant cells like the judging criteria, so the workbook is focused on your story.
 - Make sure the workbook reads well and explains how you found your insights.
 - Try to include an executive summary of your recommendations at the beginning.
 - Check that all the cells run without error.
 
βοΈ Time is ticking. Good luck!
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import nltk
from sklearn.model_selection import train_test_split
boardgame.info()boardgame.isna().sum()boardgame.Domains.value_counts(dropna=False)#There's no way we can reasonably fill 10000 NA values out of 20000 total datapoints so we're dropping Domains column.
boardgame.drop('Domains', axis=1,inplace=True)boardgame.dropna(inplace=True)sns.histplot(boardgame['Complexity Average'], bins=25)
plt.show()sns.scatterplot(data=boardgame, y='Complexity Average', x='Rating Average')
plt.show()β
β
β
β
β