Skip to content
Intermediate SQL - Soccer Match Analysis
  • AI Chat
  • Code
  • Report
  • SQL - Soccer match analysis

    Hey there soccer and SQL enthusiasts! Welcome to our little corner of the internet where we'll be diving deep into the fascinating world of soccer data analysis using SQL. In this portfolio blog, we're on a mission to unravel the mysteries of SQL CASE statements, subqueries, correlated subqueries, window functions, and aggregations, all while kicking around some soccer stats.

    Get ready to score some SQL skills as we explore real-world scenarios with our soccer database. From figuring out match outcomes to analyzing team performances, we're going to tackle it all with a blend of SQL magic and soccer passion.

    Whether you're a data aficionado, a soccer fanatic, or just curious about what happens when data meets the beautiful game, you're in for a treat. So grab your virtual cleats, and let's hit the pitch for some SQL action!

    Objectives

    • SQL CASE Statements: Categorize data with conditional logic, crafting customized output fields within SQL queries.
    • Mastering Subqueries: Retrieve data from nested SELECT statements, exploring scalar, column, and table subqueries for filtering, joining, and aggregating data.
    • Unraveling Correlated Subqueries: Understand how correlated subqueries differ, referencing outer query data for complex filtering and manipulation.
    • Harnessing Window Functions: Utilize window functions for advanced analytical tasks, calculating aggregate values over subsets of data for deeper insights.
    • Applying Aggregations: Investigate SUM, COUNT, AVG, MIN, and MAX functions, applying them to summarize and analyze data, with various aggregation techniques.
    • Practical Application with Soccer Database: Analyze match data, team performances, and league standings using SQL techniques.
    • Enhancing SQL Proficiency: Improve skills with exercises and examples, practicing efficient and effective SQL queries for problem-solving.

    Understanding Data

    Match Table:

    - Schema: match - Columns: - id: Unique identifier for each match. - country_id: Foreign key referencing the country in which the match was played. - season: Season of the match (e.g., "2019-2020"). - stage: Stage of the competition. - date: Date of the match. - hometeam_id: Foreign key referencing the home team. - awayteam_id: Foreign key referencing the away team. - home_goal: Number of goals scored by the home team. - away_goal: Number of goals scored by the away team.

    Country Table:

    - Schema: country - Columns: - id: Unique identifier for each country. - name: Name of the country.

    Team Table:

    - Schema: team - Columns: - id: Unique identifier for each team. - team_api_id: API identifier for the team. - team_long_name: Full name of the team. - team_short_name: Short name or abbreviation of the team.

    League Table:

    - Schema: league - Columns: - id: Unique identifier for each league. - country_id: Foreign key referencing the country in which the league is played. - name: Name of the league.

    Let's Dive In: Exploring Soccer Data with SQL

    Let's take a closer look at each table in our SQL database to gain a deeper understanding of the data structure and relationships within.

    Unknown integration
    DataFrameavailable as
    df45
    variable
     -- display match table
     SELECT * 
     FROM soccer.match
     LIMIT 5;
    Unknown integration
    DataFrameavailable as
    df
    variable
    -- display team table
    SELECT *
    FROM soccer.team
    LIMIT 5;
    Unknown integration
    DataFrameavailable as
    df46
    variable
    -- display country table
    SELECT *
    FROM soccer.country
    LIMIT 5;
    Unknown integration
    DataFrameavailable as
    df47
    variable
    -- display league table
    SELECT *
    FROM soccer.league
    LIMIT 5;

    Alright, folks, buckle up! We're about to embark on an exciting journey through the world of soccer data analysis. 🚀

    we've been digging into our soccer database, crunching numbers, and unraveling insights. But guess what? We've stumbled upon a challenge that's got us scratching our heads. 🤔

    We want to do something really cool - like identifying matches where FC Barcelona faced Real Madrid CF and classifying the outcomes based on who scored more goals. Sounds like a job for SQL CASE statements, right? 💡

    But hey, it's not as simple as slicing through a defense. We need to think smart and use our SQL skills to tackle this challenge head-on.

    Let's dive back into our SQL queries and see how we can pull off this feat using some fancy footwork with CASE statements. Strap in, folks, it's about to get interesting! 💥

    Using CASE Statements

    Ever wondered how many matches FC Schalke 04 and FC Bayern Munich have played?

    Unknown integration
    DataFrameavailable as
    df1
    variable
    SELECT
    	-- Select the team long name and team API id
    	team_long_name,
    	team_api_id
    FROM soccer.team
    -- Only include FC Schalke 04 and FC Bayern Munich
    WHERE team_long_name IN ('FC Schalke 04', 'FC Bayern Munich');

    Let's find out how many matches each team has played, but with a twist! We're not just counting matches; we're categorizing them based on whether they were played at home by FC Schalke 04, FC Bayern Munich, or neither.

    Unknown integration
    DataFrameavailable as
    df2
    variable
    -- Identify the home team as Bayern Munich, Schalke 04, or neither
    SELECT 
    	CASE WHEN hometeam_id = 10189 THEN 'FC Schalke 04'
            WHEN hometeam_id = 9823 THEN 'FC Bayern Munich'
             ELSE 'Other' END AS home_team,
    	COUNT(id) AS total_matches
    FROM soccer.match
    -- Group by the CASE statement alias
    GROUP BY home_team;
    ‌
    ‌
    ‌