Settlers of Catan Data

#Import packages and get enviornment set up

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

catan = pd.read_csv('my-settlers-of-catan-games-catanstats.csv')

Plan to analyse the catan data and insights we are planning to extract -

Describe data that we are looking at
Do the dice rolls match up to expectations
What is the most import resource in the game?
Is there a starting position that leads to more victories?

print(catan)

print(catan[['points','tradeGain','totalGain','totalLoss','totalAvailable']])

What data are we looking at? index - numerical value of row gameNum - game number (1-50) (Numeric) player: The player (1-4) (Numeric) points: The player's points. (Numeric) 2: The number of 2s rolled. (Numeric) 3: The number of 3s rolled. (Numeric) 4: The number of 4s rolled. (Numeric) 5: The number of 5s rolled. (Numeric) 6: The number of 6s rolled. (Numeric) 7: The number of 7s rolled. (Numeric) 8: The number of 8s rolled. (Numeric) 9: The number of 9s rolled. (Numeric) 10: The number of 10s rolled. (Numeric) 11: The number of 11s rolled. (Numeric) 12: The number of 12s rolled. (Numeric) settlement1: The first settlement. (Numeric) settlement2: The second settlement. (Numeric) production: The production. (Numeric) tradeGain: The trade gain. (Numeric) robberCardsGain: The robber cards gain. (Numeric) totalGain: The total gain. (Numeric) tradeLoss: The trade loss. (Numeric) robberCardsLoss: The robber cards loss. (Numeric) tribute: The tribute. (Numeric) totalLoss: The total loss. (Numeric) totalAvailable: The total available. (Numeric)

#plot the players who won each game and by frequency

winners = catan[catan['points']>=10]
#plt.style.use('Solarize_Light2')
sns.set_style('darkgrid')
winners.groupby('player')['player'].count().plot(kind='bar', yticks=np.arange(0,22,2), xlabel='Player', ylabel='Number of Wins', title='Wins per Player', rot=0, color='gold')

wins_by_player = winners.groupby('player')['player'].count()
print(wins_by_player)

All this is showing is that how many times the Player number won. We do not know who the player number refers to at the moment.

#How often did OP win

op_winpecentage = (winners.iloc[0,0] / wins_by_player.sum()) * 100
print(op_winpecentage)

#Top Win percentage (player 2)

top = (wins_by_player.max() / wins_by_player.sum()) * 100
print(top)

Player 2 won 36% of the time which is above 25% of the time (expected) if the data set was random. Lets look into Player 2 and see if they were implementing a specific strategy.

But first lets take a look at the dice rolls and see if the dice were fair


numb= [(1/36),(1/18),(1/12),(1/9),(5/36),(1/6),(5/36),(1/9),(1/12),(1/18),(1/36)]
data = {'2':(1/36)*3214,'3':(1/18)*3214,'4':(1/12)*3214,'5':(1/9)*3214,'6':(5/36)*3214,'7':(1/6)*3214,'8':(5/36)*3214,'9':(1/9)*3214,'10':(1/12)*3214,'11':(1/18)*3214,'12':(1/36)*3214}

prob = pd.DataFrame(data=data, index=[1,2])
model = prob.iloc[1,:]



#exploring dice rolls
#lets grab the dice columns

dice_rolls = catan[catan['player'] == 2].iloc[:,5:16]
#dice_rolls['gameNum'] = catan[catan['player'] == 2]['gameNum']

diff = dice_rolls.sum() - model

plt.style.use('ggplot')
dice_rolls.T.plot(kind='bar', stacked=True, legend=False, colormap='plasma', width = 0.7, alpha=0.9)
plt.xlabel('Number Rolled')
plt.ylabel('Frequency Rolled')
plt.title('How often each number was rolled')
model.plot(kind='bar', color='none', edgecolor='black')
plt.show()

#total rolls to compare to expected values
total_per_num = dice_rolls.sum()
print(total_per_num)
total_rolls = total_per_num.sum()
print(total_rolls)

diff = dice_rolls.sum() - model
diff.plot(kind='bar')
plt.ylabel('Difference in number of rolls (Actual vs Expected)')
plt.xlabel('Number Rolled')
plt.title('Difference in Actual Rolls vs Expected Rolls')

from the chart above, 4 was not the number to be on in this sample