Skip to content
New Workbook
Sign up
Joining Data in SQL

Joining Data with SQL

Here you can access every table used in the course. To access each table, you will need to specify the world schema in your queries (e.g., world.countries for the countries table, and world.languages for the languages table).


Note: When using sample integrations such as those that contain course data, you have read-only access. You can run queries, but cannot make any changes such as adding, deleting, or modifying the data (e.g., creating tables, views, etc.).

Take Notes

Add notes about the concepts you've learned and SQL cells with queries you want to keep.

Add your notes here

Spinner
DataFrameavailable as
world_info
variable
-- Add your own queries here
SELECT *
FROM world.languages 
LIMIT 5
Spinner
DataFrameavailable as
df
variable
SELECT 
    c1.name AS city,
    code,
    c2.name AS country,
    region,
    city_proper_pop
FROM cities AS c1
-- Perform an inner join with cities as c1 and countries as c2 on country code
___
ORDER BY code DESC;

SET OPERATIONS

INTERSECT --> return only there is the same value for both queries UNION --> does not include duplicated values UNION ALL --> include duplicated values too EXCEPT ---> compare 2 tables and return the ones that are not in common

Spinner
DataFrameavailable as
df1
variable
--BY using CTE Tables can be saved to memory and does not consume that much. 
--You may call the table by using 'match_list'

-- Set up your CTE
WITH match_list AS (
    SELECT 
  		country_id,
  	   (home_goal + away_goal) AS goals
    FROM match
  	-- Create a list of match IDs to filter data in the CTE
    WHERE id IN (
       SELECT id
       FROM match
       WHERE season = '2013/2014' AND EXTRACT(MONTH FROM date) = 08))
-- Select the league name and average of goals in the CTE
SELECT 
	name,
    AVG(goals)
FROM league AS l
-- Join the CTE onto the league table
LEFT JOIN match_list ON l.country_id = match_list.country_id
GROUP BY l.name;



----IF YOU WANT TO USE MORE THAN ONCE no need to use WITH again. Just seperate by coma

WITH home AS (
  SELECT m.id, m.date, 
  		 t.team_long_name AS hometeam, m.home_goal
  FROM match AS m
  LEFT JOIN team AS t 
  ON m.hometeam_id = t.team_api_id),
-- Declare and set up the away CTE
 away AS (
  SELECT m.id, m.date, 
  		 t.team_long_name AS awayteam, m.away_goal
  FROM match AS m
  LEFT JOIN team AS t 
  ON m.awayteam_id = t.team_api_id)
-- Select date, home_goal, and away_goal
SELECT 
	home.date,
    home.hometeam,
    away.awayteam,
    home.home_goal,
    away.away_goal
-- Join away and home on the id column
FROM home
INNER JOIN away
ON home.id = away.id;
Spinner
DataFrameavailable as
df2
variable
--By using PARTITION BY without using group by we can seperate and kinda group the result. 
--example

SELECT 
	date,
	season,
	home_goal,
	away_goal,
	CASE WHEN hometeam_id = 8673 THEN 'home' 
         ELSE 'away' END AS warsaw_location,
	-- Calculate average goals partitioned by season and month
    AVG(home_goal) OVER(PARTITION BY season, 
         	EXTRACT(MONTH FROM date)) AS season_mo_home,
    AVG(away_goal) OVER(PARTITION BY season, 
            EXTRACT(MONTH FROM date)) AS season_mo_away
FROM match
WHERE 
	hometeam_id = 8673
    OR awayteam_id = 8673
ORDER BY (home_goal + away_goal) DESC;
Spinner
DataFrameavailable as
df3
variable
---sliding windows example

SELECT 
	date,
	home_goal,
	away_goal,
    -- Create a running total and running average of home goals
    SUM(home_goal) OVER(ORDER BY date 
         ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total,
    AVG(home_goal) OVER(ORDER BY date 
         ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_avg
FROM match
WHERE 
	hometeam_id = 9908 
	AND season = '2011/2012';
	
	
	----2nd example
	
	
	SELECT 
	-- Select the date, home goal, and away goals
	date,
	home_goal,
	away_goal,
    -- Create a running total and running average of home goals
    SUM(home_goal) OVER(ORDER BY date DESC
        ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS running_total,
    AVG(home_goal) OVER(ORDER BY date DESC
        ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS running_avg
FROM match
WHERE 
	awayteam_id = 9908 
    AND season = '2011/2012';

The bounds can be any of these five options:

UNBOUNDED PRECEDING – All rows before the current row. n PRECEDING – n rows before the current row. CURRENT ROW – Just the current row. n FOLLOWING – n rows after the current row. UNBOUNDED FOLLOWING – All rows after the current row.

Spinner
DataFrameavailable as
df4
variable
-----LAG is a window function that compares the rows ' LAG(Champion,1) OVER
   --- (ORDER BY year ASC) AS Last_Champion'
----Champion u 1 satır öncesiyle karşılaştırıp önceki şampiyonu döndürüyor
   
WITH Weightlifting_Gold AS (
  SELECT
    -- Return each year's champions' countries
    Year,
    Country AS champion
  FROM Summer_Medals
  WHERE
    Discipline = 'Weightlifting' AND
    Event = '69KG' AND
    Gender = 'Men' AND
    Medal = 'Gold')

SELECT
  Year, Champion,
  -- Fetch the previous year's champion
  LAG(Champion,1) OVER
    (ORDER BY year ASC) AS Last_Champion
FROM Weightlifting_Gold
ORDER BY Year ASC;
Spinner
DataFrameavailable as
df5
variable
---PARTITION BY can be used to seperate row numbers by a specific column

WITH Athletics_Gold AS (
  SELECT DISTINCT
    Gender, Year, Event, Country
  FROM Summer_Medals
  WHERE
    Year >= 2000 AND
    Discipline = 'Athletics' AND
    Event IN ('100M', '10000M') AND
    Medal = 'Gold')

SELECT
  Gender, Year, Event,
  Country AS Champion,
  -- Fetch the previous year's champion by gender and event
  LAG(Country) OVER (PARTITION BY gender, event
            ORDER BY Year ASC) AS Last_Champion
FROM Athletics_Gold
ORDER BY Event ASC, Gender ASC, Year ASC;

LAG and LEAD fetching function