this is the nav!
Data manipulation in SQL
• AI Chat
• Code
• Report
• ## .mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Data Manipulation in SQL

Here you can access every table used in the course. To access each table, you will need to specify the `soccer` schema in your queries (e.g., `soccer.match` for the `match` table, and `soccer.league` for the `league` table).

### Explore Datasets

Use the `match`, `league`, and `country` tables to explore the data and practice your skills!

• Use the `match`, `league`, and `country` tables to return the number of matches played in Great Britain versus elsewhere in the world.
• "England", "Scotland", and "Wales" should be categorized as "Great Britain"
• All other leagues will need to be categorized as "World".
• Use the `match` and `country` tables to return the countries in which the average number of goals (home and away goals) scored are greater than the average number of goals of all matches.
• In a soccer league, points are assigned to teams based on the result of a game. Here, let's assume that 3 points are awarded for a win, 1 for a tie, and 0 for a defeat. Use the `match` table to calculate the running total of points earned by the team "Chelsea" (team id 8455) in the season "2014/2015".
• The final output should have the match date, the points earned by Chelsea, and the running total.

## Chapter 1: CASE statements

### Basic CASE statements

Unknown integration
DataFrameavailable as
df
variable
```.mfe-app-workspace-11z5vno{font-family:JetBrainsMonoNL,Menlo,Monaco,'Courier New',monospace;font-size:13px;line-height:20px;}```-- If we want to compare the number of home team wins, away team wins and ties in the 2013/2014 season:
SELECT
date,
id,
home_goal,
away_goal
FROM soccer.match
WHERE season = '2013/2014'
AND home_goal > away_goal;

-- This is good but not efficient, as you only get home team wins.``````
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Unknown integration
DataFrameavailable as
df
variable
``````-- Using CASE WHEN we can create a new column that has all 3 categories (Home team win, Away team win, Tie)
SELECT id,
home_goal,
away_goal,
CASE WHEN home_goal > away_goal THEN 'Home Team Win'
WHEN home_goal < away_goal THEN 'Away Team Win'
ELSE 'Tie' END AS outcome
FROM soccer.match
WHERE season = '2013/2014';

-- However, this has only one logical test: it returns outcomes based on whether that test is True or False. So it had only 3 outcomes.``````
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Unknown integration
DataFrameavailable as
df
variable
``````-- Now we have 2 logical tests:

SELECT date, hometeam_id, awayteam_id,
CASE WHEN hometeam_id = 8455 AND home_goal > away_goal
THEN 'Chelsea home win!'
WHEN awayteam_id = 8455 AND home_goal < away_goal
THEN 'Chelsea away win!'
ELSE 'Loss or tie :(' END AS outcome
FROM soccer.match
WHERE hometeam_id = 8455 OR awayteam_id = 8455;

-- 1) Is the team Chelsea? AND
-- 2) Did hometeam score higher than away team?
-- If we didn't specify the filters using the WHERE clause, any match not played by Chelsea would be a 'Loss or tie :('``````
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Unknown integration
DataFrameavailable as
df
variable
``````-- Let's say we are only interested in viewing data from games where Chealsea won. Simply removing the ELSE clause will still retrieve NULL values for the matches we don't care about.

-- To correct this, we can treat the entire CASE statement as a column to filter by in the WHERE clause: include the entire CASE statement except its alias in the WHERE clause.

SELECT date, season,
CASE WHEN hometeam_id = 8455 AND home_goal > away_goal
THEN 'Chelsea home win!'
WHEN awayteam_id = 8455 AND home_goal < away_goal
THEN 'Chelsea away win!' END AS outcome
FROM soccer.match
WHERE CASE WHEN hometeam_id = 8455 AND home_goal > away_goal
THEN 'Chelsea home win!'
WHEN awayteam_id = 8455 AND home_goal < away_goal
THEN 'Chelsea away win!' END IS NOT NULL;``````
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

### CASE WHEN with aggregate functions

Unknown integration
DataFrameavailable as
df
variable
``````-- We want to count how many games Chealsea won at home and away

SELECT
season,
COUNT(CASE WHEN hometeam_id = 8650 AND home_goal > away_goal
THEN id END) AS home_wins,
COUNT(CASE WHEN awayteam_id = 8650 AND away_goal > home_goal
THEN id END) AS away_wins
FROM soccer.match
GROUP BY season;

-- Instead of returning a string of text, you return the id column identifying the unique match id. As it is inside the COUNT function, it counts every id returned by this CASE statement.``````
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Unknown integration
DataFrameavailable as
df
variable
``````-- When counting information in a CASE statement, you can return anything you'd like. SQL is counting the number of rows returned by the CASE statement.

SELECT
season,
COUNT(CASE WHEN hometeam_id = 8650 AND home_goal > away_goal
THEN 54321 END) AS home_wins,
COUNT(CASE WHEN awayteam_id = 8650 AND away_goal > home_goal
THEN 'some random text' END) AS away_wins
FROM soccer.match
GROUP BY season;

-- The same result as before.``````
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Unknown integration
DataFrameavailable as
df
variable
``````SELECT
season,
SUM(CASE WHEN hometeam_id = 8650
THEN home_goal END) AS home_goals,
SUM(CASE WHEN awayteam_id = 8650
THEN away_goal END) AS away_goals
FROM soccer.match
GROUP BY season;

-- Now we need the THEN col values for the calculations``````
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Unknown integration
DataFrameavailable as
df
variable
``````SELECT
season,
ROUND(AVG(CASE WHEN hometeam_id = 8650
THEN home_goal END),2) AS avg_home_goals,
ROUND(AVG(CASE WHEN awayteam_id = 8650
THEN away_goal END),2) AS avg_away_goals
FROM soccer.match
GROUP BY season;``````
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Unknown integration
DataFrameavailable as
df
variable
``````-- Percentages with CASE and AVG
SELECT
season,
ROUND(AVG(CASE WHEN hometeam_id = 8455 AND home_goal > away_goal THEN 1
WHEN hometeam_id = 8455 AND home_goal < away_goal THEN 0
END), 2) AS pct_homewins,
ROUND(AVG(CASE WHEN awayteam_id = 8455 AND away_goal > home_goal THEN 1
WHEN awayteam_id = 8455 AND away_goal < home_goal THEN 0
END), 2) AS pct_awaywins
FROM soccer.match
GROUP BY season;
``````
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.