Skip to content
Welcome to an in-depth exploration of Euro 2024, where we delve into Europe's premier football championship using the power of SQL. This comprehensive analysis leverages advanced data queries to uncover the stories behind the stats, revealing key patterns, standout performances, and strategic insights. From goal-scoring trends to defensive strengths, player contributions to team dynamics, our SQL-powered analysis provides a detailed and dynamic view of the tournament's unfolding drama. Get ready to explore the numbers that define the beautiful game and uncover the tactical nuances shaping the path to European glory. Let's embark on this analytical journey and see what the data reveals about Euro 2024!
Unknown integration
DataFrameavailable as
df
variable
SELECT * FROM 'game_events.csv' LIMIT 100
Field | Description | Type |
---|---|---|
id | Refers to the unique id of the type | integer |
game | The name of the game | string |
event_type | The type of event | string |
created_at | The time when the event happened | timestamp |
team_name | The name of the team related to the event | string |
player_name | The name of the player that performed the event or to whom it happened | string |
related_player_name | The name of the player who is related to the event that player_name performed | string |
result | The result after the event happend | string |
minute | The game minute in which the event happened | integer |
extra_minute | The additional minutes in which the event happened (e.g. for 90+2 this would be 2) | integer |
previous_player_event | The event_type of the previous event that happened to the player | string |
previous_player_event_at | The timestamp of the previous event that happened to the player | timestamp |
seconds_after_previous_event | The seconds after the previous event happened to the player | integer |
player_id | The id of the player that performed the event or to whom it happened | integer |
related_player_id | The id of the player who is related to the event that player_name performed | integer |
participant_id | The id of the team related to the event | integer |
fixture_id | The id of the game the event happened in | integer |
🕵️♀️ Exploratory data analysis
Now, we want to further understand the data we work with and potential issues it has.
For that, so let's look at:
- Row count
- Duplicates
- Missing values
- Value distributions (i.e. can we see any weird outliers)
Row counts and uniqueness
Let's start by looking at the row counts
Unknown integration
DataFrameavailable as
df1
variable
SELECT
count(1) as rows,
count(distinct id) unique_ids
FROM 'game_events.csv'
Summary stats
Let's look at the separate tables and their columns using the SUMMARIZE
function.
Unknown integration
DataFrameavailable as
df2
variable
SUMMARIZE SELECT * FROM 'game_events.csv'
Distributions
Unknown integration
DataFrameavailable as
df3
variable
-- Distribution of "minute" values
SELECT
minute ,
count(1)as events
FROM 'game_events.csv'
GROUP BY 1
ORDER BY 2
- Seems that most events happen in the 2nd half of the game
- There are many events in the 90th minute - are those outliers or data issues?
Let's check what type of events happen in the 90th minute.
Unknown integration
DataFrameavailable as
df4
variable
-- Events in minute 90
SELECT
event_type,
count(1) as events
FROM 'game_events.csv'
WHERE minute = 90
GROUP BY 1
ORDER BY 2 DESC
Unknown integration
DataFrameavailable as
df5
variable
-- Distribution of "extra_minute"
SELECT
extra_minute ,
count(1)as events
FROM 'game_events.csv'
WHERE extra_minute <> 0
GROUP BY 1
ORDER BY 2
There were 16 events that occurred when extra time was set at 2 minutes. Let's delve into the details and uncover the types of events that took place during this critical period.