Skip to content
Exploring World Cup Data in Python (Solution)
Exploring World Cup Data in Python
This dataset (source) includes 44,066 results of international football matches starting from the very first official match in 1872 up to 2022. The matches range from FIFA World Cup to FIFI Wild Cup to regular friendly matches. The matches are strictly men's full internationals and the data does not include Olympic Games or matches where at least one of the teams was the nation's B-team, U-23 or a league select team.
Task 1: Import and prepare the dataset
- Import the
pandas
package with the usual alias.
# Import the pandas package with the usual alias
import pandas as pd
- Read
"results.csv"
. Assign toresults
. - Convert the
date
column to a datetime. - Get the year component of the
date
column; store in a new column namedyear
.
# Read results.csv. Assign to results.
results = pd.read_csv('results.csv')
# Convert the date column to a datetime
results['date'] = pd.to_datetime(results['date'])
# Get the year component of date column; store in a new column named year
results['year'] = results['date'].dt.year
# See the result
results
Task 2: Get the FIFA World Cup data
- Using
results
, count the number of rows of each tournament value. - Convert the results to a DataFrame for nicer printing.
# Count the number of rows for each tournament; convert to DataFrame
results.value_counts("tournament").to_frame("num_matches")
- Query for the rows where tournament is equal to "FIFA World Cup"
# Query for the rows where tournament is equal to "FIFA World Cup"
world_cup_res = results \
.query('tournament == "FIFA World Cup"')
# See the results
world_cup_res
Task 3: Your turn: How many matches in every world cup?
- Using
world_cup_res
, count the number of rows of each year value. - Convert the results to a DataFrame for nicer printing.
# Count the number of rows for each year; convert to DataFrame
matches_per_year = world_cup_res \
.value_counts("year") \
.to_frame("num_matches")
# See the results
matches_per_year
- Import the
plotly.express
package using the aliaspx
.