Skip to content
2 hidden cells
2 hidden cells
7 hidden cells
Analyzing Euro 2024 Soccer Data in SQL
| Field | Description | Type |
|---|---|---|
| id | Refers to the unique id of the type | integer |
| game | The name of the game | string |
| event_type | The type of event | string |
| created_at | The time when the event happened | timestamp |
| team_name | The name of the team related to the event | string |
| player_name | The name of the player that performed the event or to whom it happened | string |
| related_player_name | The name of the player who is related to the event that player_name performed | string |
| result | The result after the event happend | string |
| minute | The game minute in which the event happened | integer |
| extra_minute | The additional minutes in which the event happened (e.g. for 90+2 this would be 2) | integer |
| previous_player_event | The event_type of the previous event that happened to the player | string |
| previous_player_event_at | The timestamp of the previous event that happened to the player | timestamp |
| seconds_after_previous_event | The seconds after the previous event happened to the player | integer |
| player_id | The id of the player that performed the event or to whom it happened | integer |
| related_player_id | The id of the player who is related to the event that player_name performed | integer |
| participant_id | The id of the team related to the event | integer |
| fixture_id | The id of the game the event happened in | integer |
2 hidden cells
DataFrameas
df
variable
SELECT *
FROM game_events.csv;Exploratory data analysis
DataFrameas
df1
variable
SELECT
COUNT(1) As rows,
COUNT(DISTINCT(id)) AS unique_id
FROM game_events.csv ;Summary stats
Let's look at the separate tables and their columns using the SUMMARIZE function.
DataFrameas
df2
variable
SUMMARIZE SELECT
*
FROM game_events.csv;Distributions
DataFrameas
df3
variable
-- Distribution of "minute" values
SELECT minute,
COUNT(*) AS events
FROM game_events.csv
GROUP BY 1
ORDER BY 1 ;2 hidden cells
DataFrameas
df4
variable
-- Events in minute 90
SELECT
event_type,
COUNT(1) AS events
FROM game_events.csv
WHERE minute = 90
GROUP BY 1
ORDER BY 2 DESC;
Distribution of extra_minute
DataFrameas
df5
variable
-- Distribution of "extra_minute"
SELECT
extra_minute,
COUNT(1) AS events
FROM game_events.csv
GROUP BY 1;7 hidden cells
DataFrameas
df6
variable
WITH buckets AS (
SELECT DISTINCT
minute,
minute / 5,
ceil(minute / 5),
ceil(minute / 5) * 5 as ceil_bucket
FROM 'game_events.csv'
ORDER BY minute
)
SELECT * FROM buckets