Skip to content
New Workbook
Sign up
Joining Data in SQL

Joining Data with SQL

Here you can access every table used in the course. To access each table, you will need to specify the world schema in your queries (e.g., world.countries for the countries table, and world.languages for the languages table).

Take Notes

Add notes about the concepts you've learned and SQL cells with queries you want to keep.

Add your notes here

Spinner
DataFrameavailable as
world_info
variable
SELECT countries.name AS country,
-- Subquery that provides the count of cities   
  (SELECT COUNT(*)
   FROM cities
   WHERE cities.country_code = countries.code) AS cities_num
FROM countries
ORDER BY cities_num DESC, country
LIMIT 9;

vs

-- Find top nine countries with the most cities
select countries.name as country, count(*) as cities_num
from countries
left JOIN cities
on countries.code = cities.country_code 
-- Order by count of cities as cities_num
group by country
order by cities_num DESC
limit 9;
Spinner
DataFrameavailable as
df
variable
SELECT name, e.year, fertility_rate, unemployment_rate
FROM countries AS c
INNER JOIN populations AS p
ON c.code = p.country_code
INNER JOIN economies AS e
ON e.code = p.code
AND e.year = p.year
Spinner
DataFrameavailable as
df
variable
Run cancelled
SELECT c.name AS country, l.name AS language
FROM countries AS c        
-- Perform a cross join to languages (alias as l)
cross join languages as l
WHERE c.code in ('PAK','IND')
	AND l.code in ('PAK','IND');

Create a semi join out of the two queries you've written, which filters unique languages returned in the first query for only those languages spoken in the 'Middle East'

Spinner
DataFrameavailable as
df
variable
SELECT DISTINCT name
FROM languages
-- Add syntax to use bracketed subquery below as a filter
WHERE code IN
    (SELECT language_code
    FROM countries
    WHERE region = 'Middle East')
ORDER BY name;
Spinner
DataFrameavailable as
df
variable
SELECT *
FROM populations
-- Filter for only those populations where life expectancy is 1.15 times higher than average
WHERE life_expectancy > 1.15 *
  (SELECT AVG(life_expectancy)
   FROM populations
   WHERE year = 2015) 
    AND year = 2015;
Spinner
DataFrameavailable as
df
variable
-- Select local_name and lang_num from appropriate tables
SELECT local_name, sub.lang_num
FROM countries,
    (SELECT code, COUNT(*) AS lang_num
     FROM languages
     GROUP BY code) AS sub
-- Where codes match    
WHERE countries.code = sub.code
ORDER BY lang_num DESC;
Spinner
DataFrameavailable as
df
variable
-- Select relevant fields
SELECT code, inflation_rate, unemployment_rate
FROM economies
WHERE year = 2015 
  AND code NOT IN
-- Subquery returning country codes filtered on gov_form
    (SELECT code
     FROM countries
     WHERE (gov_form LIKE '%Monarchy%' OR gov_form LIKE '%Republic%'))
ORDER BY inflation_rate;
Spinner
DataFrameavailable as
df
variable
-- Select fields from cities
select name, country_code, city_proper_pop, metroarea_pop, (city_proper_pop/metroarea_pop)*100 as city_perc
from cities
-- Use subquery to filter city name
where name in ( select capital
from countries
where (continent = 'Europe' OR continent LIKE '%America') )
and metroarea_pop is not null
order by city_perc DESC
limit 10
Spinner
DataFrameavailable as
df
variable
SELECT 
    CASE WHEN hometeam_id = 10189 THEN 'FC Schalke 04'
         WHEN hometeam_id = 9823 THEN 'FC Bayern Munich'
         ELSE 'Other' END AS home_team,
	COUNT(id) AS total_matches
FROM matches_germany
-- Group by the CASE statement alias
GROUP BY home_team;