Skip to content
Data manipulation in SQL
  • AI Chat
  • Code
  • Report
  • Spinner

    Data Manipulation in SQL

    Here you can access every table used in the course. To access each table, you will need to specify the soccer schema in your queries (e.g., soccer.match for the match table, and soccer.league for the league table).

    Explore Datasets

    Use the match, league, and country tables to explore the data and practice your skills!

    • Use the match, league, and country tables to return the number of matches played in Great Britain versus elsewhere in the world.
      • "England", "Scotland", and "Wales" should be categorized as "Great Britain"
      • All other leagues will need to be categorized as "World".
    • Use the match and country tables to return the countries in which the average number of goals (home and away goals) scored are greater than the average number of goals of all matches.
    • In a soccer league, points are assigned to teams based on the result of a game. Here, let's assume that 3 points are awarded for a win, 1 for a tie, and 0 for a defeat. Use the match table to calculate the running total of points earned by the team "Chelsea" (team id 8455) in the season "2014/2015".
      • The final output should have the match date, the points earned by Chelsea, and the running total.

    Chapter 1: CASE statements

    Basic CASE statements

    Unknown integration
    DataFrameavailable as
    df
    variable
    -- If we want to compare the number of home team wins, away team wins and ties in the 2013/2014 season: 
    SELECT
        date,
        id,
        home_goal,
        away_goal
    FROM soccer.match
    WHERE season = '2013/2014'
          AND home_goal > away_goal;
          
    -- This is good but not efficient, as you only get home team wins.
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Unknown integration
    DataFrameavailable as
    df
    variable
    -- Using CASE WHEN we can create a new column that has all 3 categories (Home team win, Away team win, Tie)
    SELECT id,
      home_goal,
      away_goal,
      CASE WHEN home_goal > away_goal THEN 'Home Team Win'
           WHEN home_goal < away_goal THEN 'Away Team Win'
           ELSE 'Tie' END AS outcome
    FROM soccer.match
    WHERE season = '2013/2014';
    
    -- However, this has only one logical test: it returns outcomes based on whether that test is True or False. So it had only 3 outcomes.
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Unknown integration
    DataFrameavailable as
    df
    variable
    -- Now we have 2 logical tests:
    
    SELECT date, hometeam_id, awayteam_id,
      CASE WHEN hometeam_id = 8455 AND home_goal > away_goal
                THEN 'Chelsea home win!'
           WHEN awayteam_id = 8455 AND home_goal < away_goal
                THEN 'Chelsea away win!'
           ELSE 'Loss or tie :(' END AS outcome
    FROM soccer.match
    WHERE hometeam_id = 8455 OR awayteam_id = 8455;
    
    -- 1) Is the team Chelsea? AND 
    -- 2) Did hometeam score higher than away team?
    -- If we didn't specify the filters using the WHERE clause, any match not played by Chelsea would be a 'Loss or tie :('
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Unknown integration
    DataFrameavailable as
    df
    variable
    -- Let's say we are only interested in viewing data from games where Chealsea won. Simply removing the ELSE clause will still retrieve NULL values for the matches we don't care about.
    
    -- To correct this, we can treat the entire CASE statement as a column to filter by in the WHERE clause: include the entire CASE statement except its alias in the WHERE clause.
    
    SELECT date, season,
      CASE WHEN hometeam_id = 8455 AND home_goal > away_goal
                THEN 'Chelsea home win!'
           WHEN awayteam_id = 8455 AND home_goal < away_goal
                THEN 'Chelsea away win!' END AS outcome
    FROM soccer.match
    WHERE CASE WHEN hometeam_id = 8455 AND home_goal > away_goal
                    THEN 'Chelsea home win!'
               WHEN awayteam_id = 8455 AND home_goal < away_goal
                    THEN 'Chelsea away win!' END IS NOT NULL;
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

    CASE WHEN with aggregate functions

    Unknown integration
    DataFrameavailable as
    df
    variable
    -- We want to count how many games Chealsea won at home and away
    
    SELECT
        season,
        COUNT(CASE WHEN hometeam_id = 8650 AND home_goal > away_goal
              THEN id END) AS home_wins,
        COUNT(CASE WHEN awayteam_id = 8650 AND away_goal > home_goal
              THEN id END) AS away_wins
    FROM soccer.match
    GROUP BY season;
    
    -- Instead of returning a string of text, you return the id column identifying the unique match id. As it is inside the COUNT function, it counts every id returned by this CASE statement.
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Unknown integration
    DataFrameavailable as
    df
    variable
    -- When counting information in a CASE statement, you can return anything you'd like. SQL is counting the number of rows returned by the CASE statement.
    
    SELECT
        season,
        COUNT(CASE WHEN hometeam_id = 8650 AND home_goal > away_goal
              THEN 54321 END) AS home_wins,
        COUNT(CASE WHEN awayteam_id = 8650 AND away_goal > home_goal
              THEN 'some random text' END) AS away_wins
    FROM soccer.match
    GROUP BY season;
    
    -- The same result as before.
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Unknown integration
    DataFrameavailable as
    df
    variable
    SELECT
        season,
        SUM(CASE WHEN hometeam_id = 8650
              THEN home_goal END) AS home_goals,
        SUM(CASE WHEN awayteam_id = 8650
              THEN away_goal END) AS away_goals
    FROM soccer.match
    GROUP BY season;
    
    -- Now we need the THEN col values for the calculations
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Unknown integration
    DataFrameavailable as
    df
    variable
    SELECT
      season,
      ROUND(AVG(CASE WHEN hometeam_id = 8650
            THEN home_goal END),2) AS avg_home_goals,
      ROUND(AVG(CASE WHEN awayteam_id = 8650
            THEN away_goal END),2) AS avg_away_goals
    FROM soccer.match
    GROUP BY season;
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Unknown integration
    DataFrameavailable as
    df
    variable
    -- Percentages with CASE and AVG
    SELECT
      season,
      ROUND(AVG(CASE WHEN hometeam_id = 8455 AND home_goal > away_goal THEN 1
               WHEN hometeam_id = 8455 AND home_goal < away_goal THEN 0
               END), 2) AS pct_homewins,
      ROUND(AVG(CASE WHEN awayteam_id = 8455 AND away_goal > home_goal THEN 1
               WHEN awayteam_id = 8455 AND away_goal < home_goal THEN 0
               END), 2) AS pct_awaywins
    FROM soccer.match
    GROUP BY season;
    
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.