Skip to content

Welcome to an in-depth exploration of Euro 2024, where we delve into Europe's premier football championship using the power of SQL. This comprehensive analysis leverages advanced data queries to uncover the stories behind the stats, revealing key patterns, standout performances, and strategic insights. From goal-scoring trends to defensive strengths, player contributions to team dynamics, our SQL-powered analysis provides a detailed and dynamic view of the tournament's unfolding drama. Get ready to explore the numbers that define the beautiful game and uncover the tactical nuances shaping the path to European glory. Let's embark on this analytical journey and see what the data reveals about Euro 2024!

Spinner
DataFrameas
df
variable
SELECT * FROM 'game_events.csv' LIMIT 100
FieldDescriptionType
idRefers to the unique id of the typeinteger
gameThe name of the gamestring
event_typeThe type of eventstring
created_atThe time when the event happenedtimestamp
team_nameThe name of the team related to the eventstring
player_nameThe name of the player that performed the event or to whom it happenedstring
related_player_nameThe name of the player who is related to the event that player_name performedstring
resultThe result after the event happendstring
minuteThe game minute in which the event happenedinteger
extra_minuteThe additional minutes in which the event happened (e.g. for 90+2 this would be 2)integer
previous_player_eventThe event_type of the previous event that happened to the playerstring
previous_player_event_atThe timestamp of the previous event that happened to the playertimestamp
seconds_after_previous_eventThe seconds after the previous event happened to the playerinteger
player_idThe id of the player that performed the event or to whom it happenedinteger
related_player_idThe id of the player who is related to the event that player_name performedinteger
participant_idThe id of the team related to the eventinteger
fixture_idThe id of the game the event happened ininteger

🕵️‍♀️ Exploratory data analysis

Now, we want to further understand the data we work with and potential issues it has.
For that, so let's look at:

  • Row count
  • Duplicates
  • Missing values
  • Value distributions (i.e. can we see any weird outliers)

Row counts and uniqueness

Let's start by looking at the row counts

Spinner
DataFrameas
df1
variable
SELECT
	count(1) as rows,
	count(distinct id) unique_ids
FROM 'game_events.csv'

Summary stats

Let's look at the separate tables and their columns using the SUMMARIZE function.

Spinner
DataFrameas
df2
variable
SUMMARIZE SELECT * FROM 'game_events.csv'

Distributions

Spinner
DataFrameas
df3
variable
-- Distribution of "minute" values
SELECT 
	minute ,
	count(1)as events
FROM 'game_events.csv'
GROUP BY 1
ORDER BY 2
  • Seems that most events happen in the 2nd half of the game
  • There are many events in the 90th minute - are those outliers or data issues?

Let's check what type of events happen in the 90th minute.

Spinner
DataFrameas
df4
variable
-- Events in minute 90
SELECT 
	event_type,
	count(1) as events
FROM 'game_events.csv'
WHERE minute = 90
GROUP BY 1
ORDER BY 2 DESC
Spinner
DataFrameas
df5
variable
-- Distribution of "extra_minute"
SELECT 
	extra_minute ,
	count(1)as events
FROM 'game_events.csv'
WHERE extra_minute <> 0
GROUP BY 1
ORDER BY 2

There were 16 events that occurred when extra time was set at 2 minutes. Let's delve into the details and uncover the types of events that took place during this critical period.