For those of use who are really into highly complex boardgames, we might have a hardtime finding what to play next. So here I will attempt to find the boardgames with the highest complexity and highest rating.
import pandas as pd
boardgame = pd.read_csv('data/bgg_data.csv')
boardgameData Cleaning
# Import all necessary libraries
import pandas as pd
# Read the dataset
bg = pd.read_csv('data/bgg_data.csv')
print(bg.head())
# Check for any missing info
bg.isna().sum()
# Drop entries with missing figures
bg = bg.dropna()
# Check if the data columns are of the correct type and change it accordingly
bg['ID'] = bg['ID'].astype(float)
bg['Name'] = bg['Name'].astype(str)
bg['Year Published'] = bg['Year Published'].astype(int)
bg['Min Players'] = bg['Min Players'].astype(int)
bg['Max Players'] = bg['Max Players'].astype(int)
bg['Play Time'] = bg['Play Time'].astype(int)
bg['Min Age'] = bg['Min Age'].astype(int)
bg['Users Rated'] = bg['Users Rated'].astype(int)
bg['Rating Average'] = bg['Rating Average'].astype(float)
bg['BGG Rank'] = bg['BGG Rank'].astype(int)
bg['Complexity Average'] = bg['Complexity Average'].astype(float)
bg['Owned Users'] = bg['Owned Users'].astype(float)
bg['Mechanics'] = bg['Mechanics'].astype(str)
bg['Domains'] = bg['Domains'].astype(str)
# Remove the space in between mechanics
bg['Mechanics'] = bg['Mechanics'].str.replace(' ','')
# Check the dtypes
print(bg.dtypes)Analyzing the Relationship Between Mechanics and Complexity Rating
To figure out which mechanics increase a game's complexity rating, we can analyze the data from the bg DataFrame. We will first filter the games that have a high complexity rating and then examine the mechanics associated with those games.
Let's start by filtering the games with a high complexity rating.
# Filter games with high complexity rating
high_complexity_games = bg[bg['Complexity Average'] > 4.0]
# Display the filtered games
high_complexity_games[['Name', 'Complexity Average']]The above code filters the games from the bg DataFrame that have a complexity rating greater than 4.0. We then display the names of these games along with their complexity ratings.
Next, let's examine the mechanics associated with these high complexity games.
# Extract the mechanics from high complexity games
high_complexity_mechanics = high_complexity_games['Mechanics'].str.split(',').explode()
high_complexity_mechanics = high_complexity_mechanics.str.replace(' ', '')
# Count the occurrence of each mechanic
mechanics_count = high_complexity_mechanics.value_counts()
# Display the mechanics and their occurrence
mechanics_count.head(5)The above code extracts the mechanics from the Mechanics column of the high_complexity_games DataFrame. We then count the occurrence of each mechanic and display the mechanics along with their occurrence.
Finally, let's list some of the games that have the highest rating average among the top displayed mechanics.
Simulation Hexagon Grid Dice Rolling Grid Movement Variable Player Powers
# List the top 5 mechanics
top_complex_mechanics = ['DiceRolling','HexagonGrid','Simulation','GridMovement','VariablePlayerPowers']
# Display the top 5 games with highest rating average for each mechanic
for mechanic in top_complex_mechanics:
game_with_mechanic = bg['Mechanics'].str.contains(mechanic)
top_games = bg[game_with_mechanic].nlargest(5, 'Rating Average')[['Name', 'Rating Average', 'Domains']]
print(f"Top 5 games with {mechanic}:")
print(top_games)
print("\n")