Skip to content

Special Community FIFA Wrld Cup - A PORTFOLIO PROJECT & Case Study

Analyzing Data of FIFA Worldcup

“The goal is to turn data into information, and information into insight.”— Carly Fiorina

Data Science Challenge

in this notebook we will explore the data and try to answer the following questions:

  • Do teams score more or less during games taking place at higher altitudes?

  • Are teams more evenly matched in later stages?

  • Do “Home” and “Away” designations affect team performance?

Import libraries

Importing the required libraries to perform our analysis

We will install geopy, because it is not installed by default in colab, it is preferable to install them in the first cell, so other researchers can use the notebook without any problem.

we have used the following libraries in the notebook:-

LibraryPurpose
pandasData manipulation
numpyfor numerical and scientific computing
matplotlibData visualization
seabornData visualization
plotlyData visualization
geopygeocoding web services
pycountryfor ISO databases
requeststo send HTTP requests
jsondata interchange
IPython.displayData visualization
sklearnData manipulation
scipyData manipulation
tqdmFor Progress Bars
foliumData visualization
base64For binary objects images
iomanage the file related input and output operations
PILData manipulation
ls
%%capture --no-stderr
# installing libs
!pip install geopy # geocoding web services
!pip install pycountry ## for ISO databases
!pip install -U kaleido ### helper library for plotly
import pandas as pd
import seaborn as sns
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
import requests 
import json
import matplotlib.pyplot as plt
import plotly
import os

Import data

WorldCupMatches = pd.read_csv('WorldCupMatches.csv')
WorldCupPlayers = pd.read_csv('WorldCupPlayers.csv')
WorldCups = pd.read_csv('WorldCups.csv')

Reading the mentioned files in which data is stored aslo let's take a look at all the dataframes and see what we have and what we can do with it

it is always a good practice to take a look at the data before starting the analysis, so we can have a better understanding of the data

for i in [WorldCupMatches, WorldCupPlayers, WorldCups]:
    display(i.shape)
    display(i.isnull().sum())
    display(i.head())

Checking if there is any missing data in any files.

Merging Dataframes into one

we have merged the dataframes into one dataframe because it is easier to work with one dataframe rather than working with multiple dataframes, it is also easier to visualize the data in one dataframe.

we will merge the WorldCupMatches and WorldCups dataframes into one using the Year column as the key and we will use RoundID and MatchID as the keys to merge the WorldCupMatches and WorldCupPlayers dataframes into one dataframe.

df = pd.merge(WorldCupMatches, WorldCups, on='Year')
df = pd.merge(df, WorldCupPlayers, on=['RoundID', 'MatchID'])