Duplicate of FIFA World Cup - A Data Science Challenge

Special Community FIFA Wrld Cup - A PORTFOLIO PROJECT & Case Study

Analyzing Data of FIFA Worldcup

`“The goal is to turn data into information, and information into insight.”— Carly Fiorina`

Data Science Challenge

in this notebook we will explore the data and try to answer the following questions:

Do teams score more or less during games taking place at higher altitudes?
Are teams more evenly matched in later stages?
Do “Home” and “Away” designations affect team performance?

Import libraries

Importing the required libraries to perform our analysis

We will install geopy, because it is not installed by default in colab, it is preferable to install them in the first cell, so other researchers can use the notebook without any problem.

we have used the following libraries in the notebook:-

Library	Purpose
pandas	Data manipulation
numpy	for numerical and scientific computing
matplotlib	Data visualization
seaborn	Data visualization
plotly	Data visualization
geopy	geocoding web services
pycountry	for ISO databases
requests	to send HTTP requests
json	data interchange
IPython.display	Data visualization
sklearn	Data manipulation
scipy	Data manipulation
tqdm	For Progress Bars
folium	Data visualization
base64	For binary objects images
io	manage the file related input and output operations
PIL	Data manipulation

ls

%%capture --no-stderr
# installing libs
!pip install geopy # geocoding web services
!pip install pycountry ## for ISO databases
!pip install -U kaleido ### helper library for plotly

import pandas as pd
import seaborn as sns
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
import requests 
import json
import matplotlib.pyplot as plt
import plotly
import os

Import data

WorldCupMatches = pd.read_csv('WorldCupMatches.csv')
WorldCupPlayers = pd.read_csv('WorldCupPlayers.csv')
WorldCups = pd.read_csv('WorldCups.csv')

Reading the mentioned files in which data is stored aslo let's take a look at all the dataframes and see what we have and what we can do with it

it is always a good practice to take a look at the data before starting the analysis, so we can have a better understanding of the data

for i in [WorldCupMatches, WorldCupPlayers, WorldCups]:
    display(i.shape)
    display(i.isnull().sum())
    display(i.head())

Checking if there is any missing data in any files.

Merging Dataframes into one

we have merged the dataframes into one dataframe because it is easier to work with one dataframe rather than working with multiple dataframes, it is also easier to visualize the data in one dataframe.

we will merge the WorldCupMatches and WorldCups dataframes into one using the Year column as the key and we will use RoundID and MatchID as the keys to merge the WorldCupMatches and WorldCupPlayers dataframes into one dataframe.

df = pd.merge(WorldCupMatches, WorldCups, on='Year')
df = pd.merge(df, WorldCupPlayers, on=['RoundID', 'MatchID'])