Special Community FIFA Wrld Cup - A PORTFOLIO PROJECT & Case Study
Analyzing Data of FIFA Worldcup
“The goal is to turn data into information, and information into insight.”— Carly Fiorina
“The goal is to turn data into information, and information into insight.”— Carly Fiorina
Data Science Challenge
in this notebook we will explore the data and try to answer the following questions:
-
Do teams score more or less during games taking place at higher altitudes?
-
Are teams more evenly matched in later stages?
-
Do “Home” and “Away” designations affect team performance?
Import libraries
Importing the required libraries to perform our analysis
We will install geopy, because it is not installed by default in colab, it is preferable to install them in the first cell, so other researchers can use the notebook without any problem.
we have used the following libraries in the notebook:-
Library | Purpose |
---|---|
pandas | Data manipulation |
numpy | for numerical and scientific computing |
matplotlib | Data visualization |
seaborn | Data visualization |
plotly | Data visualization |
geopy | geocoding web services |
pycountry | for ISO databases |
requests | to send HTTP requests |
json | data interchange |
IPython.display | Data visualization |
sklearn | Data manipulation |
scipy | Data manipulation |
tqdm | For Progress Bars |
folium | Data visualization |
base64 | For binary objects images |
io | manage the file related input and output operations |
PIL | Data manipulation |
ls
%%capture --no-stderr
# installing libs
!pip install geopy # geocoding web services
!pip install pycountry ## for ISO databases
!pip install -U kaleido ### helper library for plotly
import pandas as pd
import seaborn as sns
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
import requests
import json
import matplotlib.pyplot as plt
import plotly
import os
Import data
WorldCupMatches = pd.read_csv('WorldCupMatches.csv')
WorldCupPlayers = pd.read_csv('WorldCupPlayers.csv')
WorldCups = pd.read_csv('WorldCups.csv')
Reading the mentioned files in which data is stored aslo let's take a look at all the dataframes and see what we have and what we can do with it
it is always a good practice to take a look at the data before starting the analysis, so we can have a better understanding of the data
for i in [WorldCupMatches, WorldCupPlayers, WorldCups]:
display(i.shape)
display(i.isnull().sum())
display(i.head())
Checking if there is any missing data in any files.
Merging Dataframes into one
we have merged the dataframes
into one dataframe because it is easier to work with one dataframe rather than working with multiple dataframes
, it is also easier to visualize the data in one dataframe.
we will merge the WorldCupMatches
and WorldCups dataframes
into one using the Year column as the key and we will use RoundID
and MatchID
as the keys to merge the WorldCupMatches and WorldCupPlayers
dataframes
into one dataframe
.
df = pd.merge(WorldCupMatches, WorldCups, on='Year')
df = pd.merge(df, WorldCupPlayers, on=['RoundID', 'MatchID'])