Skip to content

You're working as a sports journalist at a major online sports media company, specializing in soccer analysis and reporting. You've been watching both men's and women's international soccer matches for a number of years, and your gut instinct tells you that more goals are scored in women's international football matches than men's. This would make an interesting investigative article that your subscribers are bound to love, but you'll need to perform a valid statistical hypothesis test to be sure!

While scoping this project, you acknowledge that the sport has changed a lot over the years, and performances likely vary a lot depending on the tournament, so you decide to limit the data used in the analysis to only official FIFA World Cup matches (not including qualifiers) since 2002-01-01.

You create two datasets containing the results of every official men's and women's international football match since the 19th century, which you scraped from a reliable online source. This data is stored in two CSV files: women_results.csv and men_results.csv.

The question you are trying to determine the answer to is:

Are more goals scored in women's international soccer matches than men's?

You assume a 10% significance level, and use the following null and alternative hypotheses:

: The mean number of goals scored in women's international soccer matches is the same as men's.

: The mean number of goals scored in women's international soccer matches is greater than men's.

# Importing the Correct Environment (With the tools I am comfortable with)
import pandas as pd
import matplotlib as plt
import seaborn as sns

# We are running an 'Unpaired Two Sample Test' so either Parametric or Non-Parametric Tests
# Testing for normality we'll use the Shapiro-Wilk Test since n < 5000
from scipy.stats import shapiro

# Independent Samples t-test (Parametric)
from scipy.stats import ttest_ind
# Wilcoxon-Man-Whitney test (Non-Parametric)
from scipy.stats import mannwhitneyu
# Begin data imports
mens_soccer_filename = 'men_results.csv'
womens_soccer_filename = 'women_results.csv'

# Mens Dataframe
men_df = pd.read_csv(mens_soccer_filename)
women_df = pd.read_csv(womens_soccer_filename)
# Filtering to only 'FIFA World Cup' and dates greater than '2002-01-01'
women_df_filtered = women_df.copy()
women_df_filtered['date'] = pd.to_datetime(women_df_filtered['date'])
women_df_filtered = women_df_filtered[(women_df['tournament'].isin(['FIFA World Cup'])) & (women_df['date'] >= '2002-01-01')]
women_df_filtered['group_id'] = 'womens'

# Women's df preview
print(len(women_df_filtered))
display(women_df_filtered.head())
# Filtering to only 'FIFA World Cup' and dates greater than '2002-01-01'
men_df_filtered = men_df.copy()
men_df_filtered['date'] = pd.to_datetime(men_df_filtered['date'])
men_df_filtered = men_df[(men_df['tournament'].isin(['FIFA World Cup'])) & (men_df['date'] >= '2002-01-01')]
men_df_filtered['group_id'] = 'mens'

# Men's df preview
print(len(men_df_filtered))
display(men_df_filtered.head())
# Concatenate the two groups
merged_football = pd.concat([men_df_filtered, women_df_filtered], axis=0, ignore_index=True)

# All rows are present when True after concatenation
print(len(merged_football) == len(men_df_filtered) + len(women_df_filtered))

# Add a total score
merged_football['total_score'] = merged_football['home_score'] + merged_football['away_score']

# Preview Merger
display(merged_football.head())
# Check for normality in subgroups
sns.displot(data=merged_football, x='total_score', hue='group_id', kind='kde', alpha=0.5)

From the Kernel Density Estimate Plot, we see that women's scoring is not normal, and we can evaluate that further with some testing.

We will attempt the Shapiro-Wilk test on both subsets, and the whole set to determine normality.

: The data is normally distributed.

: The data is not normally distributed.

# Shaprio-Wilk Tests

#Creating our groups
women_group = merged_football[merged_football['group_id'] == 'womens']['total_score']
men_group = merged_football[merged_football['group_id'] == 'mens']['total_score']

# Running the tests
norm_women = shapiro(women_group)
norm_men = shapiro(men_group)
norm_test = shapiro(merged_football['total_score'])

print(f'womens p-value: {norm_women.pvalue}, \nmens p-value: {norm_men.pvalue}\nboth p-value: {norm_test.pvalue}')

Since the Shaprio-Wilk (normality) testing came out as 0.05 > , we reject the and reasonably assume the data is not normally distributed.

We will now move on to using our non-paramentric tool for an 'Unpaired Two Sample Test', also knowns as the Wilcoxon-Man-Whitney test.

Note: We are looking if they are the SAME, so we are utilizing a 'right-sided' test - or a 'greater' test in this instance.

# Wilcoxon-Man-Whitney Test (rank-sum test)
stat, pvalue = mannwhitneyu(x=women_group, y=men_group, alternative='greater')

# Pull the p-value
p_val = pvalue
print(stat, pvalue)
# 'fail to reject' else 'reject if below threshold'
rejection_letter = lambda x: 'fail to reject' if x > 0.1 else 'reject'

# Run lambda function for our result.
result = rejection_letter(pvalue) 

result_dict = {'p_val': p_val, 'result': result}

print(result_dict, len(merged_football))