Analyzing Euro 2024 Soccer Data in SQL .

Welcome to an in-depth exploration of Euro 2024, where we delve into Europe's premier football championship using the power of SQL. This comprehensive analysis leverages advanced data queries to uncover the stories behind the stats, revealing key patterns, standout performances, and strategic insights. From goal-scoring trends to defensive strengths, player contributions to team dynamics, our SQL-powered analysis provides a detailed and dynamic view of the tournament's unfolding drama. Get ready to explore the numbers that define the beautiful game and uncover the tactical nuances shaping the path to European glory. Let's embark on this analytical journey and see what the data reveals about Euro 2024!

DataFrameas

df

variable

SELECT * FROM 'game_events.csv' LIMIT 100

Field	Description	Type
id	Refers to the unique id of the type	integer
game	The name of the game	string
event_type	The type of event	string
created_at	The time when the event happened	timestamp
team_name	The name of the team related to the event	string
player_name	The name of the player that performed the event or to whom it happened	string
related_player_name	The name of the player who is related to the event that `player_name` performed	string
result	The result after the event happend	string
minute	The game minute in which the event happened	integer
extra_minute	The additional minutes in which the event happened (e.g. for 90+2 this would be 2)	integer
previous_player_event	The `event_type` of the previous event that happened to the player	string
previous_player_event_at	The timestamp of the previous event that happened to the player	timestamp
seconds_after_previous_event	The seconds after the previous event happened to the player	integer
player_id	The id of the player that performed the event or to whom it happened	integer
related_player_id	The id of the player who is related to the event that `player_name` performed	integer
participant_id	The id of the team related to the event	integer
fixture_id	The id of the game the event happened in	integer

🕵️‍♀️ Exploratory data analysis

Now, we want to further understand the data we work with and potential issues it has.
For that, so let's look at:

Row count
Duplicates
Missing values
Value distributions (i.e. can we see any weird outliers)

Row counts and uniqueness

Let's start by looking at the row counts

DataFrameas

df1

variable

SELECT
	count(1) as rows,
	count(distinct id) unique_ids
FROM 'game_events.csv'

Summary stats

Let's look at the separate tables and their columns using the SUMMARIZE function.

DataFrameas

df2

variable

SUMMARIZE SELECT * FROM 'game_events.csv'

Distributions

DataFrameas

df3

variable

-- Distribution of "minute" values
SELECT 
	minute ,
	count(1)as events
FROM 'game_events.csv'
GROUP BY 1
ORDER BY 2

Seems that most events happen in the 2nd half of the game
There are many events in the 90th minute - are those outliers or data issues?

Let's check what type of events happen in the 90th minute.

DataFrameas

df4

variable

-- Events in minute 90
SELECT 
	event_type,
	count(1) as events
FROM 'game_events.csv'
WHERE minute = 90
GROUP BY 1
ORDER BY 2 DESC

DataFrameas

df5

variable

-- Distribution of "extra_minute"
SELECT 
	extra_minute ,
	count(1)as events
FROM 'game_events.csv'
WHERE extra_minute <> 0
GROUP BY 1
ORDER BY 2

There were 16 events that occurred when extra time was set at 2 minutes. Let's delve into the details and uncover the types of events that took place during this critical period.

‌
‌
‌