#Import packages and get enviornment set up
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
catan = pd.read_csv('my-settlers-of-catan-games-catanstats.csv')
Plan to analyse the catan data and insights we are planning to extract -
- Describe data that we are looking at
- Do the dice rolls match up to expectations
- What is the most import resource in the game?
- Is there a starting position that leads to more victories?
print(catan)
print(catan[['points','tradeGain','totalGain','totalLoss','totalAvailable']])
What data are we looking at? index - numerical value of row gameNum - game number (1-50) (Numeric) player: The player (1-4) (Numeric) points: The player's points. (Numeric) 2: The number of 2s rolled. (Numeric) 3: The number of 3s rolled. (Numeric) 4: The number of 4s rolled. (Numeric) 5: The number of 5s rolled. (Numeric) 6: The number of 6s rolled. (Numeric) 7: The number of 7s rolled. (Numeric) 8: The number of 8s rolled. (Numeric) 9: The number of 9s rolled. (Numeric) 10: The number of 10s rolled. (Numeric) 11: The number of 11s rolled. (Numeric) 12: The number of 12s rolled. (Numeric) settlement1: The first settlement. (Numeric) settlement2: The second settlement. (Numeric) production: The production. (Numeric) tradeGain: The trade gain. (Numeric) robberCardsGain: The robber cards gain. (Numeric) totalGain: The total gain. (Numeric) tradeLoss: The trade loss. (Numeric) robberCardsLoss: The robber cards loss. (Numeric) tribute: The tribute. (Numeric) totalLoss: The total loss. (Numeric) totalAvailable: The total available. (Numeric)
#plot the players who won each game and by frequency
winners = catan[catan['points']>=10]
#plt.style.use('Solarize_Light2')
sns.set_style('darkgrid')
winners.groupby('player')['player'].count().plot(kind='bar', yticks=np.arange(0,22,2), xlabel='Player', ylabel='Number of Wins', title='Wins per Player', rot=0, color='gold')
wins_by_player = winners.groupby('player')['player'].count()
print(wins_by_player)
All this is showing is that how many times the Player number won. We do not know who the player number refers to at the moment.
#How often did OP win
op_winpecentage = (winners.iloc[0,0] / wins_by_player.sum()) * 100
print(op_winpecentage)
#Top Win percentage (player 2)
top = (wins_by_player.max() / wins_by_player.sum()) * 100
print(top)
Player 2 won 36% of the time which is above 25% of the time (expected) if the data set was random. Lets look into Player 2 and see if they were implementing a specific strategy.
But first lets take a look at the dice rolls and see if the dice were fair
numb= [(1/36),(1/18),(1/12),(1/9),(5/36),(1/6),(5/36),(1/9),(1/12),(1/18),(1/36)]
data = {'2':(1/36)*3214,'3':(1/18)*3214,'4':(1/12)*3214,'5':(1/9)*3214,'6':(5/36)*3214,'7':(1/6)*3214,'8':(5/36)*3214,'9':(1/9)*3214,'10':(1/12)*3214,'11':(1/18)*3214,'12':(1/36)*3214}
prob = pd.DataFrame(data=data, index=[1,2])
model = prob.iloc[1,:]
#exploring dice rolls
#lets grab the dice columns
dice_rolls = catan[catan['player'] == 2].iloc[:,5:16]
#dice_rolls['gameNum'] = catan[catan['player'] == 2]['gameNum']
diff = dice_rolls.sum() - model
plt.style.use('ggplot')
dice_rolls.T.plot(kind='bar', stacked=True, legend=False, colormap='plasma', width = 0.7, alpha=0.9)
plt.xlabel('Number Rolled')
plt.ylabel('Frequency Rolled')
plt.title('How often each number was rolled')
model.plot(kind='bar', color='none', edgecolor='black')
plt.show()
#total rolls to compare to expected values
total_per_num = dice_rolls.sum()
print(total_per_num)
total_rolls = total_per_num.sum()
print(total_rolls)
diff = dice_rolls.sum() - model
diff.plot(kind='bar')
plt.ylabel('Difference in number of rolls (Actual vs Expected)')
plt.xlabel('Number Rolled')
plt.title('Difference in Actual Rolls vs Expected Rolls')
from the chart above, 4 was not the number to be on in this sample