Skip to content
Live Training: Exploring World Cup Data in Python
Exploring World Cup Data in Python
This dataset (source) includes 44,066 results of international football matches starting from the very first official match in 1872 up to 2022. The matches range from FIFA World Cup to FIFI Wild Cup to regular friendly matches. The matches are strictly men's full internationals and the data does not include Olympic Games or matches where at least one of the teams was the nation's B-team, U-23 or a league select team.
Task 1: Import and prepare the dataset
- Import the
pandaspackage with the usual alias.
# Import the pandas package with the usual alias
import pandas as pd- Read
"results.csv". Assign toresults. - Convert the
datecolumn to a datetime. - Get the year component of the
datecolumn; store in a new column namedyear.
# Read results.csv. Assign to results.
results = pd.read_csv("results.csv")
results.dtypes
# Convert the date column to a datetime
results['date'] = pd.to_datetime(results['date'])
results.dtypes
# Get the year component of date column; store in a new column named year
results['year'] = results['date'].dt.year
# See the result
resultsTask 2: Get the FIFA World Cup data
- Using
results, count the number of rows of each tournament value. - Convert the results to a DataFrame for nicer printing.
# Count the number of rows for each tournament; convert to DataFrame
results.value_counts('tournament').to_frame('number_of_matches').reset_index()- Query for the rows where tournament is equal to "FIFA World Cup"
# Query for the rows where tournament is equal to "FIFA World Cup"
world_cup_res = results.query('tournament == "FIFA World Cup"')
# See the results
world_cup_resTask 3: Your turn: How many matches in every world cup?
- Using
world_cup_res, count the number of rows of each year value. - Convert the results to a DataFrame for nicer printing.
# Count the number of rows for each year; convert to DataFrame
matches_per_year = world_cup_res.value_counts('year').to_frame('num_matches')
# See the results
matches_per_year- Import the
plotly.expresspackage using the aliaspx.