Skip to content

Data Manipulation in SQL

Here you can access every table used in the course. To access each table, you will need to specify the soccer schema in your queries (e.g., soccer.match for the match table, and soccer.league for the league table).


Note: When using sample integrations such as those that contain course data, you have read-only access. You can run queries, but cannot make any changes such as adding, deleting, or modifying the data (e.g., creating tables, views, etc.).

Take Notes

Add notes about the concepts you've learned and SQL cells with queries you want to keep.

Add your notes here

Spinner
DataFrameas
df
variable
-- Add your own queries here
SELECT *
FROM soccer.country
LIMIT 5

SCHEMA

team row_number id team_api_id team_long_name team_short_name

match row_number id country_id season stage date hometeam_id awayteam_id home_goal away_goal

league row_number id country_id name

country row_number id name

Explore Datasets

Use the match, league, and country tables to explore the data and practice your skills!

  • Use the match, league, and country tables to return the number of matches played in Great Britain versus elsewhere in the world.
    • "England", "Scotland", and "Wales" should be categorized as "Great Britain"
    • All other leagues will need to be categorized as "World".
  • Use the match and country tables to return the countries in which the average number of goals (home and away goals) scored are greater than the average number of goals of all matches.
  • In a soccer league, points are assigned to teams based on the result of a game. Here, let's assume that 3 points are awarded for a win, 1 for a tie, and 0 for a defeat. Use the match table to calculate the running total of points earned by the team "Chelsea" (team id 8455) in the season "2014/2015".
    • The final output should have the match date, the points earned by Chelsea, and the running total.
Spinner
DataFrameas
df4
variable
/* Paso 1 Obtener el número de partidos jugados en cada país. Filtrar por cuántos se juegan en casa.
Paso 2: Agrupar GB (ENG, SCO, WAL) y comparar contra el resto del mundo.
Paso 3: Jugar con los cálculos - GB vs EU; vs LATAM; obtener %; stats de victorias; stats de goles
Buscar usar JOINS, CTE, Fetching, Ranking, Window functions, etc
*/
WITH total_matches AS (
	SELECT
	  	c.name as pais,
	  	COUNT(*) as matches_played
	  FROM soccer.match
	  LEFT JOIN soccer.country as c
	  ON match.country_id = c.id
	  GROUP BY pais ),
gb_matches AS (
	SELECT
		SUM(matches_played)
	FROM total_matches
	WHERE pais IN ('England','Scotland','Wales')
	),
world_matches AS (
	SELECT
		SUM(matches_played)
	FROM total_matches
	WHERE pais NOT IN ('England','Scotland','Wales')
	)

SELECT
	CASE WHEN pais IN ('England','Scotland','Wales')
	THEN 
	,
	matches_played
FROM total_matches
/*	(SELECT pais, matches_played
	 FROM total_matches
	 WHERE pais IN ('England','Scotland','Wales')) as gb,
	 (SELECT pais, matches_played
	 FROM total_matches
	 WHERE pais NOT IN ('England','Scotland','Wales')) as world
*/
--ORDER BY pais
--GROUP BY pais
LIMIT 10;
Spinner
DataFrameas
df5
variable
/* SQL practice: Use the match and country tables to return the countries in which the average number of goals (home and away goals) scored are greater than the average number of goals of all matches. */

SELECT
	c.name as pais,
	AVG(m.home_goal + m.away_goal) as avg_goals
FROM soccer.match as m
INNER JOIN soccer.country as c
	ON c.id = m.country_id
GROUP BY pais
HAVING AVG(m.home_goal + m.away_goal) >=
		(SELECT AVG(home_goal + away_goal)
		 FROM soccer.match)
ORDER BY avg_goals DESC;
Spinner
DataFrameas
df6
variable
/* SQL practice: In a soccer league, points are assigned to teams based on the result of a game. Here, let's assume that 3 points are awarded for a win, 1 for a tie, and 0 for a defeat. Use the match table to calculate the running total of points earned by the team "Chelsea" (team id 8455) in the season "2014/2015".
The final output should have the match date, the points earned by Chelsea, and the running total.*/
WITH match_points AS(
	SELECT
	id,
	CASE WHEN hometeam_id = 8455 AND home_goal > away_goal
	 	THEN 3
	 	WHEN awayteam_id = 8455 AND away_goal > home_goal
		 THEN 3
	 	WHEN home_goal = away_goal
	 	THEN 1
	 	ELSE 0 END AS points
	FROM soccer.match
),
teams AS(
	SELECT *
	FROM soccer.team	
)

SELECT
	DISTINCT f1.date,
	(SELECT t.team_long_name
	 FROM teams as t
	 WHERE hometeam_id = t.team_api_id) as home,
	f1.home_goal,
	(SELECT t.team_long_name
	 FROM teams as t
	 WHERE awayteam_id = t.team_api_id) as away,
	f1.away_goal,
	CASE WHEN hometeam_id = 8455 AND home_goal > away_goal
	 	THEN 'Win'
	 	WHEN awayteam_id = 8455 AND away_goal > home_goal
		 THEN 'Win'
	 	WHEN home_goal = away_goal
	 	THEN 'Tie'
	 	ELSE 'Loss' END AS match_outcome,
	mp.points,
	SUM(mp.points) OVER(ORDER BY f1.date ROWS BETWEEN
					  UNBOUNDED PRECEDING AND CURRENT ROW) AS total_points
FROM (SELECT *
	  FROM soccer.match
	  WHERE (hometeam_id = 8455 OR awayteam_id = 8455)
	  	AND season = '2014/2015') as f1
INNER JOIN match_points AS mp
	ON f1.id = mp.id
ORDER BY date;

/* Ganar puntos
CASE WHEN hometeam_id = 8455 AND home_goal > away_goal
	 THEN 'Chelsea win'
	 WHEN awayteam_id = 8455 AND away_goal > home_goal
	 THEN 'Chelsea win'
	 WHEN home_goal = away_goal
	 THEN 'Tie'
	 ELSE 'Chelsea lose' END AS outcome
*/
Spinner
DataFrameas
df1
variable
-- Complete the subquery inside the FROM clause. Select the country name from the country table, along with the date, the home goal, the away goal, and the total goals columns from the match table.
-- Create a column in the subquery that adds home and away goals, called total_goals. This will be used to filter the main query.
-- Select the country, date, home goals, and away goals in the main query.
-- Filter the main query for games with 10 or more total goals.
--
SELECT
	-- Select country, date, home, and away goals from the subquery
    country,
    date,
    home_goal,
    away_goal
FROM 
	-- Select country name, date, home_goal, away_goal, and total goals in the subquery
	(SELECT c.name AS country, 
     	    m.date, 
     		m.home_goal, 
     		m.away_goal,
           (m.home_goal + m.away_goal) AS total_goals
    FROM match AS m
    LEFT JOIN country AS c
    ON m.country_id = c.id) AS subquery
-- Filter by total goals scored in the main query
WHERE total_goals >= 10;
Spinner
DataFrameas
df2
variable
--
SELECT
	m.date,
    -- Get the home and away team names
    hometeam,
    awayteam,
    m.home_goal,
    m.away_goal
FROM match AS m

-- Join the home subquery to the match table
LEFT JOIN (
  SELECT match.id, team.team_long_name AS hometeam
  FROM match
  LEFT JOIN team
  ON match.hometeam_id = team.team_api_id) AS home
ON home.id = m.id

-- Join the away subquery to the match table
LEFT JOIN (
  SELECT match.id, team.team_long_name AS awayteam
  FROM match
  LEFT JOIN team
  -- Get the away team ID in the subquery
  ON match.awayteam_id = team.team_api_id) AS away
ON away.id = m.id;
Spinner
DataFrameas
df3
variable
/* This is the SQL code for the last exercise in the module. We will determine which teams beat Manchester United in the 2014/2015 season.*/
/* 01: Your first task is to create the first query that filters for matches where Manchester United played as the home team. This will become a common table expression in a later exercise.*/
SELECT 
	m.id, 
    t.team_long_name,
    -- Identify matches as home/away wins or ties
	CASE WHEN m.home_goal > m.away_goal THEN 'MU Win'
		WHEN m.home_goal < m.away_goal THEN 'MU Loss'
        ELSE 'Tie' END AS outcome
FROM match AS m
-- Left join team on the home team ID and team API id
LEFT JOIN team AS t 
ON m.hometeam_id = t.team_api_id
WHERE 
	-- Filter for 2014/2015 and Manchester United as the home team
	season = '2014/2015'
	AND t.team_long_name = 'Manchester United';


/*
02: Great job! Now that you have a query identifying the home team in a match, you will perform a similar set of steps to identify the away team. Just like the previous step, you will join the match and team tables. Each of these two queries will be declared as a Common Table Expression in the following step.

The primary difference in this query is that you will be joining the tables on awayteam_id, and reversing the match outcomes in the CASE statement.

When altering CASE statement logic in your own work, you can reverse either the logical condition (i.e., home_goal > away_goal) or the outcome in THEN -- just make sure you only reverse one of the two!
*/
SELECT 
	m.id, 
    t.team_long_name,
    -- Identify matches as home/away wins or ties
	CASE WHEN m.home_goal > m.away_goal THEN 'MU Loss'
		WHEN m.home_goal < m.away_goal THEN 'MU Win'
        ELSE 'Tie' END AS outcome
-- Join team table to the match table
FROM match AS m
LEFT JOIN team AS t 
ON m.awayteam_id = t.team_api_id
WHERE 
	-- Filter for 2014/2015 and Manchester United as the away team
	season = '2014/2015'
	AND t.team_long_name = 'Manchester United';


/*
03: Now that you've created the two subqueries identifying the home and away team opponents, it's time to rearrange your query with the home and away subqueries as Common Table Expressions (CTEs). You'll notice that the main query includes the phrase, SELECT DISTINCT. Without identifying only DISTINCT matches, you will return a duplicate record for each game played.

Continue building the query to extract all matches played by Manchester United in the 2014/2015 season.*/
-- Set up the home team CTE
WITH home AS (
  SELECT m.id, t.team_long_name,
	  CASE WHEN m.home_goal > m.away_goal THEN 'MU Win'
		   WHEN m.home_goal < m.away_goal THEN 'MU Loss' 
  		   ELSE 'Tie' END AS outcome
  FROM match AS m
  LEFT JOIN team AS t ON m.hometeam_id = t.team_api_id),
-- Set up the away team CTE
away AS (
  SELECT m.id, t.team_long_name,
	  CASE WHEN m.home_goal > m.away_goal THEN 'MU Win'
		   WHEN m.home_goal < m.away_goal THEN 'MU Loss' 
  		   ELSE 'Tie' END AS outcome
  FROM match AS m
  LEFT JOIN team AS t ON m.awayteam_id = t.team_api_id)
-- Select team names, the date and goals
SELECT DISTINCT
    date,
    home.team_long_name home_team,
    away.team_long_name AS away_team,
    m.home_goal,
    m.away_goal
-- Join the CTEs onto the match table
FROM match AS m
INNER JOIN home ON m.id = home.id
INNER JOIN away ON m.id = away.id
WHERE m.season = '2014/2015'
      AND (home.team_long_name = 'Manchester United' 
           OR away.team_long_name = 'Manchester United');


/*
04: Fantastic! You now have a result set that retrieves the match date, home team, away team, and the goals scored by each team. You have one final component of the question left -- how badly did Manchester United lose in each match?

In order to determine this, let's add a window function to the main query that ranks matches by the absolute value of the difference between home_goal and away_goal. This allows us to directly compare the difference in scores without having to consider whether Manchester United played as the home or away team!

The equation is complete for you -- all you need to do is properly complete the window function!*/
-- Set up the home team CTE
WITH home AS (
  SELECT m.id, t.team_long_name,
	  CASE WHEN m.home_goal > m.away_goal THEN 'MU Win'
		   WHEN m.home_goal < m.away_goal THEN 'MU Loss' 
  		   ELSE 'Tie' END AS outcome
  FROM match AS m
  LEFT JOIN team AS t ON m.hometeam_id = t.team_api_id),
-- Set up the away team CTE
away AS (
  SELECT m.id, t.team_long_name,
	  CASE WHEN m.home_goal > m.away_goal THEN 'MU Loss'
		   WHEN m.home_goal < m.away_goal THEN 'MU Win' 
  		   ELSE 'Tie' END AS outcome
  FROM match AS m
  LEFT JOIN team AS t ON m.awayteam_id = t.team_api_id)
-- Select columns and and rank the matches by goal difference
SELECT DISTINCT
    date,
    home.team_long_name AS home_team,
    away.team_long_name AS away_team,
    m.home_goal, m.away_goal,
    RANK() OVER(ORDER BY ABS(home_goal - away_goal) DESC) as match_rank
-- Join the CTEs onto the match table
FROM match AS m
INNER JOIN home ON m.id = home.id
INNER JOIN away ON m.id = away.id
WHERE m.season = '2014/2015'
      AND ((home.team_long_name = 'Manchester United' AND home.outcome = 'MU Loss')
      OR (away.team_long_name = 'Manchester United' AND away.outcome = 'MU Loss'));