Skip to content
Exploratory Data Analysis in Python for Beginners
Code-along | 2023-12-19 | Exploratory Data Analysis in Python for Beginners | George Cunningham
- We will be using a dataset containing data from the English Premier League between 2000 and 2022.
- We will be using DataFrames to examine the dataset in different ways, along with line plots, bar plots, and scatter plots.
Task 0: Setup
For this analysis we need the pandas and seaborn Python packages in order to analyze our dataset and generate a few different plots.
Instructions
Import the following packages.
- Import
pandasusing the aliaspd. - Import
seabornusing the aliassns. - Import the
matplotlib.pyplotpackage using the aliasplt. - From the
IPython.displaypackage, importdisplayandMarkdown.
# Import Matplotlib, pandas, and Seaborn
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
# From the IPython.display package, import display and Markdown
from IPython.display import display, MarkdownTask 1: Import the EPL data
The English Premier League data is contained in a CSV file named EPL.csv.
The dataset contains the following columns.
Season: Season year(s)Pos: final position that seasonTeam: name of teamPld: Number of matches playedW: Number of winsD: Number of drawsL: Number of lossesGF: Goals scored that seasonGA: Goals conceded that seasomGD: Difference in goals scored vs. concededPts: Total points at end of seasonQualification or Relegation: result at end of season
Instructions
Import the EPL dataset to a pandas dataframe.
- Read the data from
EPL.csv. Assign toepl. - Print the column info and head to take a look at the data.
- Select only the
'Season','Team','Pos','Pts','GF','GD', and'Qualification or relegation'columns and save them to a new variableepl_condensed.
epl=pd.read_csv("EPL.csv")
print(epl.info())
print(epl.head())
epl_condensed= epl[['Season', 'Team', 'Pos', 'Pts', 'GF', 'GD','Qualification or relegation']]Task 2: Clean up the Qualification column to make it less wordy
We need to make some changes to the imported data in order to make it a little easier to read/work with
- As we can see from calling the
.head()method on the dataframe, we have 1 column that uses a lot of text, making it difficult to read and work with. - In this section we are going to write a function to go through out newly created dataframe, editing the values in the 'Qualification or relegation' column.
- This will make the data easier to work with when it comes time to create charts/plots.
- This function will make some assumptions. Entries containing:
Champions Leaguewill be simplified toChampions LeagueUEFAorEuropawill be simplified toEuropaRelegationwill be simplified toRelegation- Anything else will be converted to
-.
Instructions
- Define a function using conditional statements to update values qualification or relegation values using the logic given in the task context.