Skip to content
Analyzing Euro 2024 Soccer Data in SQL .
  • AI Chat
  • Code
  • Report
  • Welcome to an in-depth exploration of Euro 2024, where we delve into Europe's premier football championship using the power of SQL. This comprehensive analysis leverages advanced data queries to uncover the stories behind the stats, revealing key patterns, standout performances, and strategic insights. From goal-scoring trends to defensive strengths, player contributions to team dynamics, our SQL-powered analysis provides a detailed and dynamic view of the tournament's unfolding drama. Get ready to explore the numbers that define the beautiful game and uncover the tactical nuances shaping the path to European glory. Let's embark on this analytical journey and see what the data reveals about Euro 2024!

    Unknown integration
    DataFrameavailable as
    df
    variable
    SELECT * FROM 'game_events.csv' LIMIT 100
    FieldDescriptionType
    idRefers to the unique id of the typeinteger
    gameThe name of the gamestring
    event_typeThe type of eventstring
    created_atThe time when the event happenedtimestamp
    team_nameThe name of the team related to the eventstring
    player_nameThe name of the player that performed the event or to whom it happenedstring
    related_player_nameThe name of the player who is related to the event that player_name performedstring
    resultThe result after the event happendstring
    minuteThe game minute in which the event happenedinteger
    extra_minuteThe additional minutes in which the event happened (e.g. for 90+2 this would be 2)integer
    previous_player_eventThe event_type of the previous event that happened to the playerstring
    previous_player_event_atThe timestamp of the previous event that happened to the playertimestamp
    seconds_after_previous_eventThe seconds after the previous event happened to the playerinteger
    player_idThe id of the player that performed the event or to whom it happenedinteger
    related_player_idThe id of the player who is related to the event that player_name performedinteger
    participant_idThe id of the team related to the eventinteger
    fixture_idThe id of the game the event happened ininteger

    🕵️‍♀️ Exploratory data analysis

    Now, we want to further understand the data we work with and potential issues it has.
    For that, so let's look at:

    • Row count
    • Duplicates
    • Missing values
    • Value distributions (i.e. can we see any weird outliers)

    Row counts and uniqueness

    Let's start by looking at the row counts

    Unknown integration
    DataFrameavailable as
    df1
    variable
    SELECT
    	count(1) as rows,
    	count(distinct id) unique_ids
    FROM 'game_events.csv'

    Summary stats

    Let's look at the separate tables and their columns using the SUMMARIZE function.

    Unknown integration
    DataFrameavailable as
    df2
    variable
    SUMMARIZE SELECT * FROM 'game_events.csv'

    Distributions

    Unknown integration
    DataFrameavailable as
    df3
    variable
    -- Distribution of "minute" values
    SELECT 
    	minute ,
    	count(1)as events
    FROM 'game_events.csv'
    GROUP BY 1
    ORDER BY 2
    
    • Seems that most events happen in the 2nd half of the game
    • There are many events in the 90th minute - are those outliers or data issues?

    Let's check what type of events happen in the 90th minute.

    Unknown integration
    DataFrameavailable as
    df4
    variable
    -- Events in minute 90
    SELECT 
    	event_type,
    	count(1) as events
    FROM 'game_events.csv'
    WHERE minute = 90
    GROUP BY 1
    ORDER BY 2 DESC
    Unknown integration
    DataFrameavailable as
    df5
    variable
    -- Distribution of "extra_minute"
    SELECT 
    	extra_minute ,
    	count(1)as events
    FROM 'game_events.csv'
    WHERE extra_minute <> 0
    GROUP BY 1
    ORDER BY 2

    There were 16 events that occurred when extra time was set at 2 minutes. Let's delve into the details and uncover the types of events that took place during this critical period.