Playing with dataset from the course Intermediate SQL
This course contained data on UEFA soccer matches. I show how we can simplify a query from the course, and build a query to extract information on longest winning streaks by a team during a season.
Simplifying the query from final exercise.
For reference, here is the query we had to complete:
-- Add your own queries here
-- Set up the home team CTE
WITH home AS (
SELECT m.id, t.team_long_name,
CASE WHEN m.home_goal > m.away_goal THEN 'MU Win'
WHEN m.home_goal < m.away_goal THEN 'MU Loss'
ELSE 'Tie' END AS outcome
FROM soccer.match AS m
LEFT JOIN soccer.team AS t ON m.hometeam_id = t.team_api_id),
-- Set up the away team CTE
away AS (
SELECT m.id, t.team_long_name,
CASE WHEN m.home_goal > m.away_goal THEN 'MU Loss'
WHEN m.home_goal < m.away_goal THEN 'MU Win'
ELSE 'Tie' END AS outcome
FROM soccer.match AS m
LEFT JOIN soccer.team AS t ON m.awayteam_id = t.team_api_id)
-- Select columns and and rank the matches by goal difference
SELECT DISTINCT
m.date,
home.team_long_name AS home_team,
away.team_long_name AS away_team,
m.home_goal, m.away_goal,
RANK() OVER(ORDER BY ABS(home_goal - away_goal) DESC) as match_rank
-- Join the CTEs onto the match table
FROM soccer.match AS m
INNER JOIN home ON m.id = home.id
INNER JOIN away ON m.id = away.id
WHERE m.season = '2014/2015'
AND ((home.team_long_name = 'Manchester United' AND home.outcome = 'MU Loss')
OR (away.team_long_name = 'Manchester United' AND away.outcome = 'MU Loss'));
We can do something equivalent using a simple join on a CASE WHEN. We could of course add info on whether MU was the home or away team by comparing home and away goals.
SELECT
m.date,
t.team_long_name AS winning_team,
m.home_goal,
m.away_goal,
RANK() OVER(ORDER BY ABS(m.home_goal - m.away_goal) DESC) as match_rank
FROM soccer.match AS m
INNER JOIN soccer.team AS t
ON (CASE WHEN m.home_goal < m.away_goal THEN m.awayteam_id
WHEN m.home_goal > m.away_goal THEN m.hometeam_id END) = t.team_api_id
WHERE m.season = '2014/2015' AND (
(m.hometeam_id = 10260 AND m.home_goal < m.away_goal)
OR (m.awayteam_id = 10260 and m.home_goal > m.away_goal))
ORDER BY date;
Winning streaks
Now for something more interesting: identify longest winning streaks by a team in a given season.
First, given a team id and season (e.g., Manchester United in the 2011/2012 season), we identify wins, and record when the last defeat happened in order to identify winning streaks. This will later appear as a correlated subquery.
SELECT *,
(ROW_NUMBER() OVER(ORDER BY date)) - (MAX(last_loss_rank) OVER(ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)) AS win_streak
FROM (SELECT *,
-- rank if defeat, 0 otherwise. Just apply running max to retrieve rank of last defeat:
-- multiplying by (1-team_win) is slightly faster than the more legible CASE which I commented out:
--CASE WHEN team_win=0 THEN (ROW_NUMBER() OVER(ORDER BY date)) ELSE 0 END AS last_loss_rank
(1.0-team_win)*(ROW_NUMBER() OVER(ORDER BY date)) AS last_loss_rank
FROM (SELECT
id,season,date,
hometeam_id,awayteam_id,
home_goal,away_goal,
CASE WHEN
(m.home_goal > m.away_goal AND m.hometeam_id = 10260) OR
(m.home_goal < m.away_goal AND m.awayteam_id = 10260) THEN 1
ELSE 0 END AS team_win
FROM soccer.match AS m
WHERE m.season = '2011/2012' AND (m.hometeam_id = 10260 OR m.awayteam_id = 10260)) AS sub_a) AS sub_b;
Now, doing is for all teams and seasons:
WITH team_seasons_list AS ( -- prepare a table with all teams and the seasons in which they played
SELECT t.team_api_id, t.team_long_name, m.season
FROM (SELECT DISTINCT hometeam_id AS team_api_id, season
FROM soccer.match
UNION
SELECT DISTINCT awayteam_id AS team_api_id, season
FROM soccer.match) AS m
LEFT JOIN soccer.team AS t
USING(team_api_id)
)
-- For a given team and season, calculate the longest streak
SELECT team_long_name, season,
(SELECT MAX(win_streak) as longest_streak --, team_seasons_list.team_long_name, team_seasons_list.season
FROM (SELECT (ROW_NUMBER() OVER(ORDER BY date)) - (MAX(last_loss_rank) OVER(ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)) AS win_streak
FROM (SELECT *,
-- rank if defeat, 0 otherwise. Just apply running max to retrieve rank of last defeat:
-- maybe multiplying by (1-team_win) would be faster than CASE...
CASE WHEN team_win=0 THEN (ROW_NUMBER() OVER(ORDER BY date)) ELSE 0 END AS last_loss_rank
FROM (SELECT
id,season,date,
hometeam_id,awayteam_id,
home_goal,away_goal,
CASE WHEN
(m.home_goal > m.away_goal AND m.hometeam_id = team_seasons_list.team_api_id) OR
(m.home_goal < m.away_goal AND m.awayteam_id = team_seasons_list.team_api_id) THEN 1
ELSE 0 END AS team_win
FROM soccer.match AS m
WHERE m.season=team_seasons_list.season AND (m.hometeam_id = team_seasons_list.team_api_id OR m.awayteam_id = team_seasons_list.team_api_id)) AS sub_a) AS sub_b
) AS sub_win_streaks)
FROM team_seasons_list
-- Sort the table by longest streaks, descending
ORDER BY longest_streak DESC;
Bayern Munich sure had an impressive 2013/2014 season!
https://www.bundesliga.com/en/news/Bundesliga/0000282346.jsp