Skip to content
0

Which board game should you play?

πŸ“– Background

After a tiring week, what better way to unwind than a board game night with friends and family? But the question remains: which game should you pick? You have gathered a dataset containing information of over 20,000 board games. It's time to utilize your analytical skills and use data-driven insights to persuade your group to try the game you've chosen!

import pandas as pd
boardgame = pd.read_csv('data/bgg_data.csv')
boardgame

πŸ’ͺ Challenge

Explore and analyze the board game data, and share the intriguing insights with your friends through a report. Here are some steps that might help you get started:

  • Is this dataset ready for analysis? Some variables have inappropriate data types, and there are outliers and missing values. Apply data cleaning techniques to preprocess the dataset.
  • Use data visualization techniques to draw further insights from the dataset.
  • Find out if the number of players impacts the game's average rating.

πŸ§‘β€βš–οΈ Judging criteria

This is a community-based competition. The top 5 most upvoted entries will win.

The winners will receive DataCamp merchandise.

βœ… Checklist before publishing into the competition

  • Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
  • Remove redundant cells like the judging criteria, so the workbook is focused on your story.
  • Make sure the workbook reads well and explains how you found your insights.
  • Try to include an executive summary of your recommendations at the beginning.
  • Check that all the cells run without error.

βŒ›οΈ Time is ticking. Good luck!

import matplotlib.pyplot as plt
import seaborn as sns

import numpy as np

import nltk
from sklearn.model_selection import train_test_split
boardgame.info()
boardgame.isna().sum()
boardgame.Domains.value_counts(dropna=False)
#There's no way we can reasonably fill 10000 NA values out of 20000 total datapoints so we're dropping Domains column.
boardgame.drop('Domains', axis=1,inplace=True)
boardgame.dropna(inplace=True)
sns.histplot(boardgame['Complexity Average'], bins=25)
plt.show()
sns.scatterplot(data=boardgame, y='Complexity Average', x='Rating Average')
plt.show()
β€Œ
β€Œ
β€Œ