Skip to content
FieldDescriptionType
idRefers to the unique id of the typeinteger
gameThe name of the gamestring
event_typeThe type of eventstring
created_atThe time when the event happenedtimestamp
team_nameThe name of the team related to the eventstring
player_nameThe name of the player that performed the event or to whom it happenedstring
related_player_nameThe name of the player who is related to the event that player_name performedstring
resultThe result after the event happendstring
minuteThe game minute in which the event happenedinteger
extra_minuteThe additional minutes in which the event happened (e.g. for 90+2 this would be 2)integer
previous_player_eventThe event_type of the previous event that happened to the playerstring
previous_player_event_atThe timestamp of the previous event that happened to the playertimestamp
seconds_after_previous_eventThe seconds after the previous event happened to the playerinteger
player_idThe id of the player that performed the event or to whom it happenedinteger
related_player_idThe id of the player who is related to the event that player_name performedinteger
participant_idThe id of the team related to the eventinteger
fixture_idThe id of the game the event happened ininteger

2 hidden cells
Spinner
DataFrameas
df
variable
SELECT *
FROM game_events.csv;

Exploratory data analysis

Spinner
DataFrameas
df1
variable
SELECT 
	COUNT(1) As rows,
	COUNT(DISTINCT(id)) AS unique_id
FROM game_events.csv ;

Summary stats

Let's look at the separate tables and their columns using the SUMMARIZE function.

Spinner
DataFrameas
df2
variable
SUMMARIZE SELECT 
				*
		  FROM game_events.csv;

Distributions

Spinner
DataFrameas
df3
variable
-- Distribution of "minute" values
SELECT minute,
	   COUNT(*) AS events
FROM game_events.csv
GROUP BY 1
ORDER BY 1 ;

2 hidden cells
Spinner
DataFrameas
df4
variable
-- Events in minute 90
SELECT
	event_type,
	COUNT(1) AS events
FROM  game_events.csv
WHERE minute = 90
GROUP BY 1
ORDER BY 2 DESC;

Distribution of extra_minute

Spinner
DataFrameas
df5
variable
-- Distribution of "extra_minute"
SELECT
	extra_minute,
	COUNT(1) AS events
FROM  game_events.csv
GROUP BY 1;

7 hidden cells
Spinner
DataFrameas
df6
variable

WITH buckets AS (
	SELECT DISTINCT
		minute,
		minute / 5,
		ceil(minute / 5),
		ceil(minute / 5) * 5 as ceil_bucket
	FROM 'game_events.csv'
	ORDER BY minute
)

SELECT * FROM buckets