Skip to content

⚽ UEFA Insights – 3. Most Efficient Teams (Shots on Target per Shot Attempt)

Main Goal To create visualizations that showcase the teams with the highest efficiency in converting shot attempts into shots on target during the UEFA Championship stages for the 2020, 2021, and 2022 seasons.

Data source: UEFA match data provided in DataCamp's Data Sources repository for DataLab projects hosted on Snowflake.

Dataset Overview

Quick overview of a dataset for UEFA season 2020, SOCCER.TBL_UEFA_2020

Spinner
DataFrameas
df
variable
-- Explore the data in the table
SELECT *
FROM SOCCER.TBL_UEFA_2020
LIMIT 5

SQL Query Setup

  • The query combines match data from three different tables (2020, 2021, 2022) into a single dataset.
  • It calculates the total number of shots and shots on target for each team, both home and away.
  • Finally, it computes the shooting efficiency for each team and orders the results by efficiency in descending order.
  • Output is saved in a data frame shooting_efficiency_per_team for later visualization.
Spinner
DataFrameas
shooting_efficiency_per_team
variable
WITH all_matches AS (
    SELECT * FROM SOCCER.TBL_UEFA_2020
    UNION ALL
    SELECT * FROM SOCCER.TBL_UEFA_2021
    UNION ALL
    SELECT * FROM SOCCER.TBL_UEFA_2022
),
all_shots AS (
    SELECT 
        TEAM_NAME_HOME AS TEAM, 
        TOTAL_SHOTS_HOME AS SHOTS,
        SHOTS_ON_TARGET_HOME AS TARGETS
    FROM all_matches

    UNION ALL

    SELECT 
        TEAM_NAME_AWAY AS TEAM, 
        TOTAL_SHOTS_AWAY AS SHOTS,
        SHOTS_ON_TARGET_AWAY AS TARGETS
    FROM all_matches
),
team_total_shots AS (
    SELECT 
        TEAM,
        SUM(SHOTS) AS TOTAL_SHOTS,
        SUM(TARGETS) AS TOTAL_TARGETS
    FROM all_shots
    GROUP BY TEAM
)
SELECT
    TEAM,
    TOTAL_TARGETS,
    TOTAL_SHOTS,
    ROUND(TOTAL_TARGETS / NULLIF(TOTAL_SHOTS, 0), 3) AS EFFICIENCY
FROM team_total_shots
ORDER BY EFFICIENCY DESC;

Visualization Setup

Data Source: Data frame shooting_efficiency_per_team generated by a SQL query.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Data Frame Preview

Quick preview of a data frame shooting_efficiency_per_team obtained as a result of the SQL query execution.

shooting_efficiency_per_team.sort_values('EFFICIENCY', ascending = False).head()

Potential Data Discrepancies

Upon reviewing the query output, an anomaly was detected in the computed metric 'EFFICIENCY', necessitating further examination...

# Analyze the 'EFFICIENCY' column to check for values above 1.0
anomalies = shooting_efficiency_per_team[shooting_efficiency_per_team['EFFICIENCY'] > 1.0]

# Display the rows where 'EFFICIENCY' is above 1.0
anomalies

Anomaly Analysis

  • 'EFFICIENCY' is calculated as TOTAL_TARGETS / TOTAL_SHOTS.
  • Since TOTAL_TARGETS cannot exceed TOTAL_SHOTS, 'EFFICIENCY' should not logically be above 1.0.
  • If there are any values above 1.0, they are likely due to data errors and should be reviewed or corrected.
# Correct the anomalies by setting 'EFFICIENCY' to NaN
shooting_efficiency_per_team.loc[shooting_efficiency_per_team['EFFICIENCY'] > 1.0, 'EFFICIENCY'] = float('nan')

# Display the corrected data frame
shooting_efficiency_per_team.sort_values('EFFICIENCY', ascending = False).head()

Anomaly Correction

  • By setting 'EFFICIENCY' to NaN for values above 1.0, we effectively mark these entries as invalid or erroneous.
  • This allows for easy identification and handling of these anomalies in subsequent data processing steps.
  • Using NaN ensures that these values are excluded from calculations and analyses, preventing them from skewing results.
# Sort the data by 'EFFICIENCY' in descending order and select the top 15 teams
top_15_teams = shooting_efficiency_per_team.sort_values('EFFICIENCY', ascending = False).head(15)

# Create the horizontal bar chart
plt.figure(figsize = (10, 6))
barplot = sns.barplot(
    x = 'EFFICIENCY', 
    y = 'TEAM', 
    data = top_15_teams, 
    palette = 'viridis'
)

# Add titles and labels
plt.title('Top 15 Most Efficient Teams', fontsize = 12)
plt.xlabel('Efficiency')
plt.ylabel('Team')

# Label efficiency values on bars
for container in barplot.containers:
    barplot.bar_label(container, fmt='%.2f')

plt.show()