Skip to content
New Workbook
Sign up
Exploring World Cup Data in Python (Solution)

Exploring World Cup Data in Python

This dataset (source) includes 44,066 results of international football matches starting from the very first official match in 1872 up to 2022. The matches range from FIFA World Cup to FIFI Wild Cup to regular friendly matches. The matches are strictly men's full internationals and the data does not include Olympic Games or matches where at least one of the teams was the nation's B-team, U-23 or a league select team.

Task 1: Import and prepare the dataset

  • Import the pandas package with the usual alias.
# Import the pandas package with the usual alias
import pandas as pd
  • Read "results.csv". Assign to results.
  • Convert the date column to a datetime.
  • Get the year component of the date column; store in a new column named year.
# Read results.csv. Assign to results.
results = pd.read_csv('results.csv')

# Convert the date column to a datetime
results['date'] = pd.to_datetime(results['date'])

# Get the year component of date column; store in a new column named year 
results['year'] = results['date'].dt.year

# See the result
results

Task 2: Get the FIFA World Cup data

  • Using results, count the number of rows of each tournament value.
  • Convert the results to a DataFrame for nicer printing.
# Count the number of rows for each tournament; convert to DataFrame
results.value_counts("tournament").to_frame("num_matches")
  • Query for the rows where tournament is equal to "FIFA World Cup"
# Query for the rows where tournament is equal to "FIFA World Cup"
world_cup_res = results \
	.query('tournament == "FIFA World Cup"')

# See the results
world_cup_res

Task 3: Your turn: How many matches in every world cup?

  • Using world_cup_res, count the number of rows of each year value.
  • Convert the results to a DataFrame for nicer printing.
# Count the number of rows for each year; convert to DataFrame
matches_per_year = world_cup_res \
	.value_counts("year") \
	.to_frame("num_matches")

# See the results
matches_per_year
  • Import the plotly.express package using the alias px.