Skip to content
Exploratory Data Analysis in Python for Beginners
Exploratory Data Analysis in Python
- We will be using a dataset containing data from the English Premier League between 2000 and 2022.
- We will be using DataFrames to examine the dataset in different ways, along with line plots, bar plots, and scatter plots.
Task 0: Setup
For this analysis we need the pandas
and seaborn
Python packages in order to analyze our dataset and generate a few different plots.
Instructions
Import the following packages.
- Import
pandas
using the aliaspd
. - Import
seaborn
using the aliassns
. - Import the
matplotlib.pyplot
package using the aliasplt
. - From the
IPython.display
package, importdisplay
andMarkdown
.
# Import Matplotlib, pandas, and Seaborn
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
# From the IPython.display package, import display and Markdown
from IPython.display import display, Markdown
Task 1: Import the EPL data
The English Premier League data is contained in a CSV file named EPL.csv
.
The dataset contains the following columns.
Season
: Season year(s)Pos
: final position that seasonTeam
: name of teamPld
: Number of matches playedW
: Number of winsD
: Number of drawsL
: Number of lossesGF
: Goals scored that seasonGA
: Goals conceded that seasomGD
: Difference in goals scored vs. concededPts
: Total points at end of seasonQualification or Relegation
: result at end of season
Instructions
Import the EPL dataset to a pandas dataframe.
- Read the data from
EPL.csv
. Assign toepl
. - Print the column info and head to take a look at the data.
- Select only the
'Season'
,'Team'
,'Pos'
,'Pts'
,'GF'
,'GD'
, and'Qualification or relegation'
columns and save them to a new variableepl_condensed
.
epl = pd.read_csv("EPL.csv")
print(epl.info())
print(epl.head())
epl_condensed = epl[['Season','Team','Pos','Pts','GF','GD', 'Qualification or relegation']]
Task 2: Clean up the Qualification column to make it less wordy
We need to make some changes to the imported data in order to make it a little easier to read/work with
- As we can see from calling the
.head()
method on the dataframe, we have 1 column that uses a lot of text, making it difficult to read and work with. - In this section we are going to write a function to go through out newly created dataframe, editing the values in the 'Qualification or relegation' column.
- This will make the data easier to work with when it comes time to create charts/plots.
- This function will make some assumptions. Entries containing:
Champions League
will be simplified toChampions League
UEFA
orEuropa
will be simplified toEuropa
Relegation
will be simplified toRelegation
- Anything else will be converted to
-
.
Instructions
- Define a function using conditional statements to update values qualification or relegation values using the logic given in the task context.