Skip to content
UEFA Insights – 3. Most Efficient Teams
⚽ UEFA Insights – 3. Most Efficient Teams (Shots on Target per Shot Attempt)
Main Goal To create visualizations that showcase the teams with the highest efficiency in converting shot attempts into shots on target during the UEFA Championship stages for the 2020, 2021, and 2022 seasons.
Data source: UEFA match data provided in DataCamp's Data Sources repository for DataLab projects hosted on Snowflake.
Dataset Overview
Quick overview of a dataset for UEFA season 2020, SOCCER.TBL_UEFA_2020
DataFrameas
df
variable
-- Explore the data in the table
SELECT *
FROM SOCCER.TBL_UEFA_2020
LIMIT 5SQL Query Setup
- The query combines match data from three different tables (2020, 2021, 2022) into a single dataset.
- It calculates the total number of shots and shots on target for each team, both home and away.
- Finally, it computes the shooting efficiency for each team and orders the results by efficiency in descending order.
- Output is saved in a data frame
shooting_efficiency_per_teamfor later visualization.
DataFrameas
shooting_efficiency_per_team
variable
WITH all_matches AS (
SELECT * FROM SOCCER.TBL_UEFA_2020
UNION ALL
SELECT * FROM SOCCER.TBL_UEFA_2021
UNION ALL
SELECT * FROM SOCCER.TBL_UEFA_2022
),
all_shots AS (
SELECT
TEAM_NAME_HOME AS TEAM,
TOTAL_SHOTS_HOME AS SHOTS,
SHOTS_ON_TARGET_HOME AS TARGETS
FROM all_matches
UNION ALL
SELECT
TEAM_NAME_AWAY AS TEAM,
TOTAL_SHOTS_AWAY AS SHOTS,
SHOTS_ON_TARGET_AWAY AS TARGETS
FROM all_matches
),
team_total_shots AS (
SELECT
TEAM,
SUM(SHOTS) AS TOTAL_SHOTS,
SUM(TARGETS) AS TOTAL_TARGETS
FROM all_shots
GROUP BY TEAM
)
SELECT
TEAM,
TOTAL_TARGETS,
TOTAL_SHOTS,
ROUND(TOTAL_TARGETS / NULLIF(TOTAL_SHOTS, 0), 3) AS EFFICIENCY
FROM team_total_shots
ORDER BY EFFICIENCY DESC;Visualization Setup
Data Source: Data frame shooting_efficiency_per_team generated by a SQL query.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as snsData Frame Preview
Quick preview of a data frame shooting_efficiency_per_team obtained as a result of the SQL query execution.
shooting_efficiency_per_team.sort_values('EFFICIENCY', ascending = False).head()Potential Data Discrepancies
Upon reviewing the query output, an anomaly was detected in the computed metric 'EFFICIENCY', necessitating further examination...
# Analyze the 'EFFICIENCY' column to check for values above 1.0
anomalies = shooting_efficiency_per_team[shooting_efficiency_per_team['EFFICIENCY'] > 1.0]
# Display the rows where 'EFFICIENCY' is above 1.0
anomaliesAnomaly Analysis
- 'EFFICIENCY' is calculated as TOTAL_TARGETS / TOTAL_SHOTS.
- Since TOTAL_TARGETS cannot exceed TOTAL_SHOTS, 'EFFICIENCY' should not logically be above 1.0.
- If there are any values above 1.0, they are likely due to data errors and should be reviewed or corrected.
# Correct the anomalies by setting 'EFFICIENCY' to NaN
shooting_efficiency_per_team.loc[shooting_efficiency_per_team['EFFICIENCY'] > 1.0, 'EFFICIENCY'] = float('nan')
# Display the corrected data frame
shooting_efficiency_per_team.sort_values('EFFICIENCY', ascending = False).head()Anomaly Correction
- By setting 'EFFICIENCY' to NaN for values above 1.0, we effectively mark these entries as invalid or erroneous.
- This allows for easy identification and handling of these anomalies in subsequent data processing steps.
- Using NaN ensures that these values are excluded from calculations and analyses, preventing them from skewing results.
# Sort the data by 'EFFICIENCY' in descending order and select the top 15 teams
top_15_teams = shooting_efficiency_per_team.sort_values('EFFICIENCY', ascending = False).head(15)
# Create the horizontal bar chart
plt.figure(figsize = (10, 6))
barplot = sns.barplot(
x = 'EFFICIENCY',
y = 'TEAM',
data = top_15_teams,
palette = 'viridis'
)
# Add titles and labels
plt.title('Top 15 Most Efficient Teams', fontsize = 12)
plt.xlabel('Efficiency')
plt.ylabel('Team')
# Label efficiency values on bars
for container in barplot.containers:
barplot.bar_label(container, fmt='%.2f')
plt.show()