Skip to content
New Workbook
Sign up
Exploratory Data Analysis in Python for Beginners

Exploratory Data Analysis in Python

  • We will be using a dataset containing data from the English Premier League between 2000 and 2022.
  • We will be using DataFrames to examine the dataset in different ways, along with line plots, bar plots, and scatter plots.

Task 0: Setup

For this analysis we need the pandas and seaborn Python packages in order to analyze our dataset and generate a few different plots.

Instructions

Import the following packages.

  • Import pandas using the alias pd.
  • Import seaborn using the alias sns.
  • Import the matplotlib.pyplot package using the alias plt.
  • From the IPython.display package, import display and Markdown.
# Import Matplotlib, pandas, and Seaborn
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

# From the IPython.display package, import display and Markdown
from IPython.display import display, Markdown

Task 1: Import the EPL data

The English Premier League data is contained in a CSV file named EPL.csv.

The dataset contains the following columns.

  • Season: Season year(s)
  • Pos: final position that season
  • Team: name of team
  • Pld: Number of matches played
  • W: Number of wins
  • D: Number of draws
  • L: Number of losses
  • GF: Goals scored that season
  • GA: Goals conceded that seasom
  • GD: Difference in goals scored vs. conceded
  • Pts: Total points at end of season
  • Qualification or Relegation: result at end of season

Instructions

Import the EPL dataset to a pandas dataframe.

  • Read the data from EPL.csv. Assign to epl.
  • Print the column info and head to take a look at the data.
  • Select only the 'Season', 'Team', 'Pos', 'Pts', 'GF', 'GD', and 'Qualification or relegation' columns and save them to a new variable epl_condensed.
epl = pd.read_csv("EPL.csv")

print(epl.info())
print(epl.head())

epl_condensed = epl[['Season','Team','Pos','Pts','GF','GD', 'Qualification or relegation']]

Task 2: Clean up the Qualification column to make it less wordy

We need to make some changes to the imported data in order to make it a little easier to read/work with

  • As we can see from calling the .head() method on the dataframe, we have 1 column that uses a lot of text, making it difficult to read and work with.
  • In this section we are going to write a function to go through out newly created dataframe, editing the values in the 'Qualification or relegation' column.
  • This will make the data easier to work with when it comes time to create charts/plots.
  • This function will make some assumptions. Entries containing:
    • Champions League will be simplified to Champions League
    • UEFA or Europa will be simplified to Europa
    • Relegation will be simplified to Relegation
    • Anything else will be converted to -.

Instructions

  • Define a function using conditional statements to update values qualification or relegation values using the logic given in the task context.