Skip to content

⚽ UEFA Insights – 1. Win Probability Accuracy (Per Stage/Per Team)

This notebook visualizes how accurately the prediction model performed across different tournament stages of UEFA Championship, seasons 2020, 2021, 2022.
Data source: UEFA match data provided in DataCamp's Data Sources repository for DataLab projects hosted on Snowflake.

Spinner
DataFrameas
win_probability_accuracy_stage
variable
-- Evaluates how accurately the prediction model performed across different tournament stages (2020–2022).
-- Output is saved in a data frame 'win_probability_accuracy_stage' for later visualization.

WITH all_matches AS (
    SELECT * FROM SOCCER.TBL_UEFA_2020
    UNION ALL
    SELECT * FROM SOCCER.TBL_UEFA_2021
    UNION ALL
    SELECT * FROM SOCCER.TBL_UEFA_2022
),
match_results AS (
  SELECT
	STAGE,
    CASE
      WHEN PREDICTION_TEAM_HOME_WIN > PREDICTION_DRAW AND PREDICTION_TEAM_HOME_WIN > PREDICTION_TEAM_AWAY_WIN THEN 'HOME_WIN'
      WHEN PREDICTION_TEAM_AWAY_WIN > PREDICTION_TEAM_HOME_WIN AND PREDICTION_TEAM_AWAY_WIN > PREDICTION_DRAW THEN 'AWAY_WIN'
      ELSE 'DRAW'
    END AS PREDICTED_RESULT,

    CASE
      WHEN TEAM_HOME_SCORE > TEAM_AWAY_SCORE THEN 'HOME_WIN'
      WHEN TEAM_HOME_SCORE < TEAM_AWAY_SCORE THEN 'AWAY_WIN'
      ELSE 'DRAW'
    END AS ACTUAL_RESULT
  FROM all_matches
),
accuracy_check AS (
  SELECT *,
         CASE WHEN PREDICTED_RESULT = ACTUAL_RESULT THEN 1 ELSE 0 END AS IS_CORRECT
  FROM match_results
)
SELECT 
  STAGE, 	
  COUNT(*) AS TOTAL_MATCHES,
  SUM(IS_CORRECT) AS CORRECT_PREDICTIONS,
  ROUND(SUM(IS_CORRECT) / COUNT(*), 3) AS ACCURACY
FROM accuracy_check
GROUP BY STAGE;
Spinner
DataFrameas
win_probability_accuracy_team
variable
-- Evaluates how accurately the prediction model performed across all teams during UEFA tournaments (2020–2022).
-- Output is saved in a data frame 'win_probability_accuracy_team' for later visualization.

WITH all_matches AS (
    SELECT * FROM SOCCER.TBL_UEFA_2020
    UNION ALL
    SELECT * FROM SOCCER.TBL_UEFA_2021
    UNION ALL
    SELECT * FROM SOCCER.TBL_UEFA_2022
),
match_prediction_vs_actual AS (
  SELECT
	*,
    CASE
      WHEN PREDICTION_TEAM_HOME_WIN > PREDICTION_DRAW AND PREDICTION_TEAM_HOME_WIN > PREDICTION_TEAM_AWAY_WIN THEN 'HOME_WIN'
      WHEN PREDICTION_TEAM_AWAY_WIN > PREDICTION_TEAM_HOME_WIN AND PREDICTION_TEAM_AWAY_WIN > PREDICTION_DRAW THEN 'AWAY_WIN'
      ELSE 'DRAW'
    END AS PREDICTED_RESULT,

    CASE
      WHEN TEAM_HOME_SCORE > TEAM_AWAY_SCORE THEN 'HOME_WIN'
      WHEN TEAM_HOME_SCORE < TEAM_AWAY_SCORE THEN 'AWAY_WIN'
      ELSE 'DRAW'
    END AS ACTUAL_RESULT
  FROM all_matches
),
team_results AS (
  SELECT
    TEAM_NAME_HOME AS TEAM_NAME,
    'HOME' AS SIDE,
    PREDICTED_RESULT,
    ACTUAL_RESULT,
    CASE
      WHEN PREDICTED_RESULT = 'HOME_WIN' THEN 'WIN'
      WHEN PREDICTED_RESULT = 'AWAY_WIN' THEN 'LOSS'
      ELSE 'DRAW'
    END AS PREDICTED_OUTCOME,
    CASE
      WHEN TEAM_HOME_SCORE > TEAM_AWAY_SCORE THEN 'WIN'
      WHEN TEAM_HOME_SCORE < TEAM_AWAY_SCORE THEN 'LOSS'
      ELSE 'DRAW'
    END AS ACTUAL_OUTCOME
  FROM match_prediction_vs_actual

  UNION ALL

  SELECT
    TEAM_NAME_AWAY AS TEAM_NAME,
    'AWAY' AS SIDE,
    PREDICTED_RESULT,
    ACTUAL_RESULT,
    CASE
      WHEN PREDICTED_RESULT = 'AWAY_WIN' THEN 'WIN'
      WHEN PREDICTED_RESULT = 'HOME_WIN' THEN 'LOSS'
      ELSE 'DRAW'
    END AS PREDICTED_OUTCOME,
    CASE
      WHEN TEAM_AWAY_SCORE > TEAM_HOME_SCORE THEN 'WIN'
      WHEN TEAM_AWAY_SCORE < TEAM_HOME_SCORE THEN 'LOSS'
      ELSE 'DRAW'
    END AS ACTUAL_OUTCOME
  FROM match_prediction_vs_actual
),
accuracy_check AS (
  SELECT *,
         CASE WHEN PREDICTED_OUTCOME = ACTUAL_OUTCOME THEN 1 ELSE 0 END AS IS_CORRECT
  FROM team_results
)
SELECT 
	TEAM_NAME,
	COUNT(*) AS TOTAL_MATCHES,
	SUM(IS_CORRECT) AS CORRECT_PREDICTIONS,
	ROUND(SUM(IS_CORRECT) / COUNT(*), 3) AS ACCURACY
FROM accuracy_check
GROUP BY TEAM_NAME
HAVING COUNT(*) > 1 AND SUM(IS_CORRECT) > 0
ORDER BY ACCURACY DESC
LIMIT 15;

Visualization Setup

Data Source: Data frames win_probability_accuracy_stage and win_probability_accuracy_team generated by SQL queries.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.ticker import PercentFormatter
win_probability_accuracy_stage.sort_values('ACCURACY', ascending = False).head()
plt.figure(figsize = (8, 5))
sns.barplot(data = win_probability_accuracy_stage, x = 'STAGE', y = 'ACCURACY')
plt.title('Prediction Accuracy by Stage')
plt.ylabel('Accuracy')
plt.gca().yaxis.set_major_formatter(PercentFormatter(xmax = 1))
plt.xlabel('Stage')
plt.xticks(rotation = 90)
plt.tight_layout()
plt.text(15,-0.48,'Source: SQL analysis of prediction accuracy from UEFA 2020–2022 dataset.')
plt.show()
win_probability_accuracy_team.sort_values('ACCURACY', ascending = False).head()
plt.figure(figsize = (8, 5))
sns.barplot(data = win_probability_accuracy_team, x = 'TEAM_NAME', y = 'ACCURACY')
plt.title('Prediction Accuracy by Team')
plt.ylabel('Accuracy')
plt.gca().yaxis.set_major_formatter(PercentFormatter(xmax=1))
plt.xlabel('Team')
plt.xticks(rotation = 90)
plt.tight_layout()
plt.text(15,-0.7,'Source: SQL analysis of prediction accuracy from UEFA 2020–2022 dataset.')
plt.show()

Insight Summary: Win Probability Accuracy Tracker

By Stage

  • The prediction model was most accurate in earlier group stage rounds and quarter-finals, reaching 50%.
  • Performance dropped sharply in the semis and final, with 0% accuracy in some stages.
  • This suggests the model struggles in high-stakes, unpredictable matches.

By Team (Top Performers)

  • The model predicted matches involving lesser-known or lower-seeded teams (e.g., Ferencváros, Başakşehir) more accurately.
  • Accuracy remained high (~70–75%) for top clubs like Liverpool and Sevilla, but lower than for smaller clubs — suggesting some bias or unpredictability in bigger games.
  • The model performed best with teams that likely followed more consistent patterns.