Skip to content

Joining Data with SQL

Here you can access every table used in the course. To access each table, you will need to specify the world schema in your queries (e.g., world.countries for the countries table, and world.languages for the languages table).


Note: When using sample integrations such as those that contain course data, you have read-only access. You can run queries, but cannot make any changes such as adding, deleting, or modifying the data (e.g., creating tables, views, etc.).

Take Notes

_An Example of JOIN INs that combines codes from previous course _

  • SELECT region, AVG(gdp_percapita) AS avg_gdp
  • FROM countries AS c
  • LEFT JOIN economies AS e
  • USING(code)
  • WHERE year = 2010
  • GROUP BY region
  • ORDER BY avg_gdp DESC

  • LIMIT 10

when a left join is performed the order of tables should change otherwise u will have an issue Full join with multiple joines (you have to perform the Full join several times while applying the USING clause)

SELECT c1.name AS country, region, l.name AS language,basic_unit, frac_unit FROM countries as c1 -- Full join with languages (alias as l) FULL JOIN languages as l USING(code) -- Full join with currencies (alias as c2) FULL JOIN currencies AS c2 USING(code) WHERE region LIKE 'M%esia';

CROSS JOIN is followed by a WHERE clause (note this is not for filtering) as below
SELECT c.name AS country, region, life_expectancy AS life_exp FROM countries AS c -- Join to populations (alias as p) using an appropriate join INNER JOIN populations as P ON c.code = p.country_code -- Filter for only results in the year 2010 WHERE year = 2010 -- Sort by life_exp ORDER BY life_exp -- Limit to five records LIMIT 5

**unions and intersects ** SELECT * FROM economies2015 -- Set operation

UNION ------ union all brings all duplicates, union removes duplicates

-- Select all fields from economies2019 SELECT * FROM economies2019 ORDER BY code, year; ** intersect** SELECT name, country_code FROM cities INTERSECT SELECT name, code FROM countries alternatively ....countries.name means its from contries table and so give the samw result
SELECT cities.name, country_code FROM cities INTERSECT SELECT contries.name, code FROM countries

Example of a semi-join

SELECT DISTINCT name FROM languages -- Add syntax to use bracketed subquery below as a filter WHERE code IN (SELECT code FROM countries WHERE region = 'Middle East') ORDER BY name; (semi-join combines 2 seperate expressions)

ANTI-JOIN+

SELECT code, name FROM countries WHERE continent = 'Oceania' --~~ ~~Filter for countries not included in the bracketed subquery~~~~ AND code NOT IN (SELECT code FROM currencies);

Subqueries

A subquery, also known as a nested query or inner query, is a query that is embedded within another query. It is used to retrieve data from one or more tables and use that result set as a condition or filter in the outer query.

There are two types of subqueries:

  1. Single-row subquery: A subquery that returns only one row.
  2. Multiple-row subquery: A subquery that returns multiple rows.

Subqueries can be used in various parts of a SQL statement, such as the SELECT, FROM, WHERE, and HAVING clauses.

Let's explore some examples of subqueries.

Single-row Subquery

A single-row subquery returns only one row and can be used in a comparison operation or as a column expression.

Here's an example that uses a single-row subquery in the WHERE clause:

SELECT name FROM customers WHERE age > (SELECT AVG(age) FROM customers);

In this example, the subquery (SELECT AVG(age) FROM customers) calculates the average age of all customers, and the outer query selects the names of customers whose age is greater than the average age.

Multiple-row Subquery

A multiple-row subquery returns multiple rows and can be used in a comparison operation or as a table expression.

Here's an example that uses a multiple-row subquery in the FROM clause:

SELECT c.name, o.order_date FROM customers AS c JOIN (SELECT customer_id, MAX(order_date) AS latest_order_date FROM orders GROUP BY customer_id) AS o ON c.id = o.customer_id;

In this example, the subquery (SELECT customer_id, MAX(order_date) AS latest_order_date FROM orders GROUP BY customer_id) calculates the latest order date for each customer, and the outer query joins the customers table with the subquery result to retrieve the customer name and latest order date.

**question **

You can use SQL to do calculations for you. Suppose you only want records from 2015 with life_expectancy above 1.15 * avg_life_expectancy. You could use the following SQL query.(step one/first argument)

_The answer from your query has now been nested into another query; use this calculation to filter populations for all records where life_expectancy is 1.15 times higher than average.

_

SELECT * FROM populations -- Filter for only those populations where life expectancy is 1.15 times higher than average WHERE life_expectancy > 1.15 * (SELECT AVG(life_expectancy) FROM populations WHERE year = 2015) AND year = 2015;

?????? Return the name, country_code and urbanarea_pop for all capital cities (not aliased). --~~ Select relevant fields from cities table~~

SELECT name, country_code, urbanarea_pop

FROM cities

-- Filter using a subquery on the countries table

WHERE name IN

(SELECT capital

FROM countries)

ORDER BY urbanarea_pop DESC;

**Select local_name from countries, with the aliased lang_num from your subquery (which has been nested and aliased for you as sub). Use WHERE to match the code field from countries and sub.

  • Select local_name and lang_num from appropriate tables

SELECT local_name, sub.lang_num** FROM countries, (SELECT code, COUNT(*) AS lang_num FROM languages GROUP BY code) AS sub...........this sub is then used in the select statement. -- Where codes match
WHERE countries.code = sub.code ORDER BY lang_num DESC;

Select country code, inflation_rate, and unemployment_rate from economies. Filter code for the set of countries which do not contain the words "Republic" or "Monarchy" in their gov_form.

  • Select relevant fields SELECT code, inflation_rate, unemployment_rate FROM economies WHERE year = 2015 AND code NOT IN (SELECT code FROM countries WHERE (gov_form LIKE '%Monarchy%' OR gov_form LIKE '%Republic%')) ORDER BY inflation_rate; -- Subquery returning country codes filtered on gov_form

Subqueries inside FROM

Subqueries can be used inside the FROM clause of a SQL query to create a temporary table that can be used in the main query. This can be useful when you need to perform complex calculations or filtering on a subset of data before joining it with other tables.

Here is an example of using a subquery inside the FROM clause:

SELECT * FROM ( SELECT customer_id, SUM(amount) AS total_amount FROM orders GROUP BY customer_id ) AS subquery WHERE total_amount > 1000;

In this example, the subquery calculates the total amount of orders for each customer, and then the main query selects only the customers whose total amount is greater than 1000.

You can also use subqueries inside the FROM clause to join multiple tables or perform other operations. Just make sure to give the subquery a unique alias so that it can be referenced in the main query.

----------------------final exercise ------------

Select country code, inflation_rate, and unemployment_rate from economies. Filter code for the set of countries which do not contain the words "Republic" or "Monarchy" in their gov_form.

SELECT code, inflation_rate, unemployment_rate FROM economies WHERE year = 2015 AND code NOT IN (SELECT code FROM countries WHERE (gov_form LIKE '%Monarchy%' OR gov_form LIKE '%Republic%')) ORDER BY inflation_rate; -- Subquery returning country codes filtered on gov_form

From cities, select the city name, country code, proper population, and metro area population, as well as the field city_perc, which calculates the proper population as a percentage of metro area population for each city (using the formula provided). Filter city name with a subquery that selects capital cities from countries in 'Europe' or continents with 'America' at the end of their name. Exclude NULL values in metroarea_pop. Order by city_perc (descending) and return only the first 10 rows.

SELECT name, country_code, city_proper_pop, metroarea_pop, city_proper_pop / metroarea_pop * 100 AS city_perc FROM cities -- Use subquery to filter city name WHERE name IN (SELECT capital FROM countries WHERE (continent = 'Europe' OR continent LIKE '%America')) -- Add filter condition such that metroarea_pop does not have null values AND metroarea_pop IS NOT NULL -- Sort and limit the result ORDER BY city_perc DESC LIMIT 10;

SQL Joins

SQL joins are used to combine rows from two or more tables based on a related column between them. There are different types of joins in SQL, including:

  • Inner Join
  • Left Join
  • Right Join
  • Full Outer Join

Let's explore each type of join with examples.

Inner Join

An inner join returns only the rows that have matching values in both tables. It combines rows from two tables based on a related column between them.

Here's an example of an inner join:

SELECT * FROM table1 INNER JOIN table2 ON table1.column = table2.column;

Left Join

A left join returns all the rows from the left table and the matching rows from the right table. If there are no matching rows in the right table, NULL values are returned for the columns of the right table.

Here's an example of a left join:

SELECT * FROM table1 LEFT JOIN table2 ON table1.column = table2.column;

Right Join

A right join returns all the rows from the right table and the matching rows from the left table. If there are no matching rows in the left table, NULL values are returned for the columns of the left table.

Here's an example of a right join:

SELECT * FROM table1 RIGHT JOIN table2 ON table1.column = table2.column;

Full Outer Join

A full outer join returns all the rows from both tables, including the unmatched rows. If there are no matching rows, NULL values are returned for the columns of the table that doesn't have a match.

Here's an example of a full outer join:

SELECT * FROM table1 FULL OUTER JOIN table2 ON table1.column = table2.column;