Skip to content
Premier League 18/19 Stats Data
Soccer Data Analysis
This dataset contains data of every game from the 2018-2019 season in the English Premier League.
# original data set; no modifications.
import pandas as pd
soccer = pd.read_csv("soccer18-19.csv")
print(soccer.shape)
soccer.head(100)Data Dictionary
| Column | Explanation |
|---|---|
| Div | Division the game was played in |
| Date | The date the game was played |
| HomeTeam | The home team |
| AwayTeam | The away team |
| FTHG | Full time home goals |
| FTAG | Full time away goals |
| FTR | Full time result |
| HTHG | Half time home goals |
| HTAG | Half time away goals |
| HTR | Half time result |
| Referee | The referee of the game |
| HS | Number of shots taken by home team |
| AS | Number of shots taken by away team |
| HST | Number of shots taken by home team on target |
| AST | Number of shots taken by away team on target |
| HF | Number of fouls made by home team |
| AF | Number of fouls made by away team |
| HC | Number of corners taken by home team |
| AC | Number of corners taken by away team |
| HY | Number of yellow cards received by home team |
| AY | Number of yellow cards received by away team |
| HR | Number of red cards received by home team |
| AR | Number of red cards received by away team |
Source of dataset.
Fixing DataSet
Before we making SQL analyis, let's make the following changes to the DB:
- Add a matchID index column
- Drop 'Div' column as it's unnecessary
- Replace the 'AS' column name with 'ASS' because 'AS' is an SQL keyword and might cause errors
# create match_id column numbering it 1 - 380
soccer.insert(0, 'MatchID', range(1, len(soccer) + 1))
# drop 'Div' column
soccer = soccer.drop(columns='Div')
# replace column AS with ASS because AS is an SQL keyword and would interrupt future calculations
soccer['ASS'] = soccer.pop('AS')
soccer.head(5)2018/19 Season Premier League Table
- Summary of main end-of-season
- SORT: by points (highest on top!)
- To see code, toggle 'Hidden code' ABOVE table or the eye icon to the TOP LEFT of table 👇
DataFrameas
df1
variable
-- Count matches won and lost for each team
SELECT
team,
COUNT(MatchID) AS MP,
COUNT(CASE WHEN result = 'win' THEN 1 END) AS W,
COUNT(CASE WHEN result = 'draw' THEN 1 END) AS D,
COUNT(CASE WHEN result = 'loss' THEN 1 END) AS L,
SUM(goals_for) AS GF,
SUM(goals_against) AS GA,
SUM(goal_difference) AS GD,
SUM(point) AS points
FROM (
SELECT
HomeTeam AS team,
MatchID,
CASE
WHEN FTHG > FTAG THEN 'win'
WHEN FTHG < FTAG THEN 'loss'
ELSE 'draw'
END AS result,
FTHG as goals_for,
FTAG as goals_against,
(FTHG - FTAG) AS goal_difference,
CASE
WHEN FTHG > FTAG THEN 3
WHEN FTHG = FTAG THEN 1
ELSE 0
END AS point
FROM
soccer
UNION ALL
SELECT
AwayTeam AS team,
MatchID,
CASE
WHEN FTAG > FTHG THEN 'win'
WHEN FTAG < FTHG THEN 'loss'
ELSE 'draw'
END AS result,
FTAG as goals_for,
FTHG as goals_against,
(FTAG - FTHG) AS goal_difference,
CASE
WHEN FTHG < FTAG THEN 3
WHEN FTHG = FTAG THEN 1
ELSE 0
END AS point
FROM
soccer
) AS matches
GROUP BY
team
ORDER BY
points DESC;{Premier League Winners 2018/19. (n.d.). https://www.google.com/url?sa=i&url=https%3A%2F%2Fwww.mancitycore.com%2Fmanchester-city-crowned-premier-league-2018-19-champions%2F&psig=AOvVaw3WowV1hBSEKPP36JmQhz__&ust=1721973458556000&source=images&cd=vfe&opi=89978449&ved=0CBQQjRxqFwoTCOju6pPBwYcDFQAAAAAdAAAAABAE}
Shooting Statistics
- Summary of major shooting⚽️ statistics like shots taken, faced, cleansheets 🧤, goaless games, etc
- SORT: Alphabetical
Hidden code
df