Skip to content

Joining Data with SQL

Here you can access every table used in the course. To access each table, you will need to specify the world schema in your queries (e.g., world.countries for the countries table, and world.languages for the languages table).


Note: When using sample integrations such as those that contain course data, you have read-only access. You can run queries, but cannot make any changes such as adding, deleting, or modifying the data (e.g., creating tables, views, etc.).

Take Notes

Add notes about the concepts you've learned and SQL cells with queries you want to keep.

Add your notes here

Spinner
DataFrameas
world_info
variable
-- Add your own queries here
SELECT *
FROM world.languages
LIMIT 5;
Spinner
DataFrameas
df
variable
-- Keep only the name of the city, the name of the country, and the region the country is located in (in the order specified). Alias the name of the city AS city and the name of the country AS country.
SELECT
    cities.name AS city,
    countries.name AS country,
    countries.region
FROM cities
-- Perform an inner join with the cities table on the left and the countries table on the right.
INNER JOIN countries
-- Match on country codes
ON cities.country_code = countries.code;
Spinner
DataFrameas
df1
variable
-- Perform a LEFT JOIN and use aggregate function AVG().
-- Note the order of the syntax. Keep in mind that the order of the syntax is not necessarily the order of execution!
SELECT region, AVG(gdp_percapita) AS avg_gdp
FROM countries AS c
LEFT JOIN economies AS e
	USING(code)
WHERE year = 2010
GROUP BY region
-- Order by descending avg_gdp
ORDER BY AVG(gdp_percapita) DESC 
-- Return only first 10 records
LIMIT 10;
Spinner
DataFrameas
df2
variable
-- You will determine the names of the five countries and their respective regions with the lowest life expectancy for the year 2010. Use your knowledge about joins, filtering, sorting and limiting to create this list!
SELECT 
	c.name AS country,
    region,
    life_expectancy AS life_exp
FROM countries AS c
-- Join to populations (alias as p) using an appropriate join
FULL JOIN populations as p
    ON c.code = p.country_code
-- Filter for only results in the year 2010
WHERE year = 2010
-- Sort by life_exp
ORDER BY life_expectancy
-- Limit to five records
LIMIT 5;
Spinner
DataFrameas
df3
variable
-- Follow the instructions below to get the urban area population for capital cities only. You'll use the countries and cities tables displayed in the console to help identify columns of interest as you build your query.
-- Return the name, country_code and urbanarea_pop for all capital cities (not aliased).
-- Select relevant fields from cities table
SELECT name, country_code, urbanarea_pop
FROM cities
-- Filter using a subquery on the countries table
WHERE cities.name IN
    (SELECT capital
    FROM countries)
ORDER BY urbanarea_pop DESC;
Spinner
DataFrameas
df4
variable
-- In Step 1, you'll begin with a LEFT JOIN combined with a GROUP BY to select the nine countries with the most cities appearing in the cities table, along with the counts of these cities. In Step 2, you'll write a query that returns the same result as the join, but leveraging a nested query instead.
--
-- 1. Write a LEFT JOIN with countries on the left and the cities on the right, joining on country code.
-- 2. In the SELECT statement of your join, include country names as country, and count the cities in each country, aliased as cities_num.
-- 3. Sort by cities_num (descending), and country (ascending), limiting to the first nine records.
-- Find top nine countries with the most cities
SELECT
    countries.name as country,
    (SELECT COUNT(*)
    FROM cities
    WHERE countries.code = cities.country_code) as cities_num
FROM countries
LEFT JOIN cities  
    ON countries.code = cities.country_code
GROUP BY countries.code, cities.country_code
-- Parece que tuviera que agrupar los datos de ambas tablas: izq por countries y der por cities
-- Order by count of cities as cities_num
ORDER BY cities_num DESC,
        country ASC
LIMIT 9;
Spinner
DataFrameas
df5
variable
-- Segunda parte
--Complete the subquery to return a result equivalent to your LEFT JOIN, counting all cities in the cities table as cities_num.
-- Use the WHERE clause to enable the correct country codes to be matched in the cities and countries columns.
SELECT countries.name AS country,
-- Subquery that provides the count of cities   
  (SELECT COUNT(*)
   FROM cities
   WHERE countries.code = cities.country_code) AS cities_num
FROM countries
ORDER BY cities_num DESC, country
LIMIT 9;
Spinner
DataFrameas
df6
variable
-- NOTA: las subqueries pueden ir en el WHERE o en el FROM. En este ejercicio, van en el WHERE:
-- our task is to determine the top 10 capital cities in Europe and the Americas by city_perc, a metric you'll calculate. city_perc is a percentage that calculates the "proper" population in a city as a percentage of the total population in the wider metro area, as follows:
-- city_proper_pop / metroarea_pop * 100
-- Do not use table aliasing in this exercise.
--
-- Select fields from cities
SELECT 
    DISTINCT name,
    country_code,
    city_proper_pop,
    metroarea_pop,
    ((city_proper_pop / metroarea_pop) * 100) as city_perc
FROM cities
-- Use subquery to filter city name
WHERE name IN
    (SELECT
        capital
    FROM countries
    WHERE continent like 'Europe'
        or continent LIKE '%America')
-- Add filter condition such that metroarea_pop does not have null values
AND metroarea_pop IS NOT NULL
-- Sort and limit the result
ORDER BY city_perc DESC
LIMIT 10;