Analyzing Euro 2024 Soccer Data in SQL

Field	Description	Type
id	Refers to the unique id of the type	integer
game	The name of the game	string
event_type	The type of event	string
created_at	The time when the event happened	timestamp
team_name	The name of the team related to the event	string
player_name	The name of the player that performed the event or to whom it happened	string
related_player_name	The name of the player who is related to the event that `player_name` performed	string
result	The result after the event happend	string
minute	The game minute in which the event happened	integer
extra_minute	The additional minutes in which the event happened (e.g. for 90+2 this would be 2)	integer
previous_player_event	The `event_type` of the previous event that happened to the player	string
previous_player_event_at	The timestamp of the previous event that happened to the player	timestamp
seconds_after_previous_event	The seconds after the previous event happened to the player	integer
player_id	The id of the player that performed the event or to whom it happened	integer
related_player_id	The id of the player who is related to the event that `player_name` performed	integer
participant_id	The id of the team related to the event	integer
fixture_id	The id of the game the event happened in	integer

2 hidden cells

DataFrameas

df

variable

SELECT *
FROM game_events.csv;

Exploratory data analysis

DataFrameas

df1

variable

SELECT 
	COUNT(1) As rows,
	COUNT(DISTINCT(id)) AS unique_id
FROM game_events.csv ;

Summary stats

Let's look at the separate tables and their columns using the SUMMARIZE function.

DataFrameas

df2

variable

SUMMARIZE SELECT 
				*
		  FROM game_events.csv;

Distributions

DataFrameas

df3

variable

-- Distribution of "minute" values
SELECT minute,
	   COUNT(*) AS events
FROM game_events.csv
GROUP BY 1
ORDER BY 1 ;

2 hidden cells

DataFrameas

df4

variable

-- Events in minute 90
SELECT
	event_type,
	COUNT(1) AS events
FROM  game_events.csv
WHERE minute = 90
GROUP BY 1
ORDER BY 2 DESC;

Distribution of extra_minute

DataFrameas

df5

variable

-- Distribution of "extra_minute"
SELECT
	extra_minute,
	COUNT(1) AS events
FROM  game_events.csv
GROUP BY 1;

7 hidden cells

DataFrameas

df6

variable


WITH buckets AS (
	SELECT DISTINCT
		minute,
		minute / 5,
		ceil(minute / 5),
		ceil(minute / 5) * 5 as ceil_bucket
	FROM 'game_events.csv'
	ORDER BY minute
)

SELECT * FROM buckets

‌
‌
‌

Analyzing Euro 2024 Soccer Data in SQL

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Summary stats

Distributions

Summary stats