course
Introduction to SQL Joins
During your data analysis in SQL, you will often need to look at multiple tables and join them to get the desired results. Joining tables for analyzing data is a required skill for a data scientist as well. In this tutorial, you will learn about different types of joins that you can perform in PostgreSQL. You will first study the preliminary concepts of joins in SQL, then learn about different types of joins and what are their syntaxes in PostgreSQL. You will finish off this tutorial by executing various types of join queries using two tables. To be able to follow along, you should be familiar with the basics of SQL and how to write simple queries in PostgreSQL. If you are looking for resources learn these, the following ones can be useful -
- Beginner's Guide to PostgreSQL
- Intro to SQL for Data Science course by DataCamp
Let's begin.
Setting up the database environment in PostgreSQL
Before you study the basics of SQL joins it would be a good idea to create some tables for your analyses. You will need at least two tables having at least one column in common. Although you can execute self-join queries on the same table, let's ignore that fact for the time being.
You can perform all the SQL operations on a tool called pgAdmin
which generally comes with the installation of PostgreSQL. Let's open pgAdmin
and create a database (give it a name of your choice) to create the tables if you don't have an existing database to proceed with. You can then create the following two tables with the specifications as given -
- student_name (id, name)
- student_stream (id, stream)
Note that both the tables have id
column in common. You can use the following CREATE
statements to create the tables -
- For student_name -
CREATE TABLE student_name ( id smallint, name "char"[] );
- For student_stream -
CREATE TABLE student_stream ( id smallint, stream "char"[] );
Let's now insert some records into the tables.
- For student_name -
INSERT INTO student_name(id, name) VALUES (1, 'Sayak');
INSERT INTO student_name(id, name) VALUES (2, 'Alex');
INSERT INTO student_name(id, name) VALUES (3, 'Sameer');
INSERT INTO student_name(id, name) VALUES (4, 'Rick');
- For student_stream -
INSERT INTO student_stream(id, stream) VALUES (1, 'CS');
INSERT INTO student_stream(id, stream) VALUES (1, 'IT');
INSERT INTO student_stream(id, stream) VALUES (2, 'ECE');
INSERT INTO student_stream(id, stream) VALUES (9, 'ECE');
The table entries should be like the following in case you executed the above queries without changing anything -
You now have two simple tables ready. Let's now proceed towards studying the basics of SQL joins.
SQL Joins - Basics
SQL Joins allow you to collate two or more (sometimes just one table) tables using common identifiers. Take the example of the above two tables that you created - both the tables have id
column in common. You may question the need of joining in SQL. Let's discuss this in brief.
Due to normalization constraints, you may not have all the required information present in a single table. Normalization is not only desired but also is necessary in order to maintain consistency, to reduce redundancy and to prevent several insertions and update anomalies. Consider the above two table again. Suppose you wish to find out the streams in which Sayak
has enrolled. To get the answer to this question, you will need to join the two tables (sub-queries can also be used, but let's not consider that now) and then proceed accordingly.
For two tables to be joined there has to be something in common between the two. Does this mean both the tables have to have at least one column with the same name? Or what exactly is this condition?
The tables you wish to join may not have a column with the same name, but logically they should be the same, i.e., their data-types should be the same. You cannot join two tables having a column with the same name but different data-types. Let's now study the different types of SQL joins.
Different types of SQL Joins
In this section, you will study several types of SQL Joins -
- INNER JOIN
- Self-join
- OUTER JOIN
- LEFT JOIN
- RIGHT JOIN
- FULL JOIN
- CROSS JOIN
- Semi-join and Anti-join
PostgreSQL provides separate keywords for the types of joins mentioned in all capital letters. Let's study each of them one by one. For this, you will first study the joins visually, and then you will execute the respective join queries in PostgreSQL.
INNER JOIN
Take a look at the following diagram in order to understand INNER JOIN
visually -
In the above example, the column under consideration is the id
column. INNER JOIN
will ignore rest of the columns for which the values are not common in both the tables. Let's now execute a query in PostgreSQL which would perform INNER JOIN
between the two tables - student_name and student_stream.
Upon executing the following query, you get the result as presented in the above figure -
SELECT s1.id, s1.name, s2.stream
FROM student_name AS s1
INNER JOIN student_stream AS s2
ON s1.id = s2.id;
s1
and s2
are the aliases for the two tables. You used the ON
keyword along with INNER JOIN
. The query can be executed using the USING
keyword as well -
SELECT s1.id, s1.name, s2.stream
FROM student_name AS s1
INNER JOIN student_stream AS s2
USING (id);
Self Join allows you to perform joining the same table. But where this might be a good option? Suppose you have a table consisting of the following columns -
- Country
- Continent
Now you want to enlist two countries in a way that their continents are the same. The following figure should give you a sense of the expected results -
OUTER JOIN
OUTER JOIN
can further be divided into three types -
LEFT JOIN or Left Outer JOIN: The following diagram gives you a fair idea of LEFT JOIN
in SQL -
Note that unlike INNER JOIN
, LEFT JOIN
fetches you the records from the left (w.r.t to the order you specify in your query) table for which there was not any matching entry in the right table. This, in turn, tells you that Sameer
and Rick
have not enrolled in any streams. The respective query for this will be -
SELECT s1.id, s1.name, s2.stream
FROM student_name AS s1
LEFT JOIN student_stream AS s2
ON s1.id = s2.id;
RIGHT JOIN or Right Outer JOIN : RIGHT JOIN
is exactly the opposite of LEFT JOIN
-
RIGHT JOIN
can help you in finding the stream(s) for which no student has enrolled. The query for this would be -
SELECT s1.id, s1.name, s2.stream
FROM student_name AS s1
RIGHT JOIN student_stream AS s2
ON s1.id = s2.id;
FULL JOIN or Full Outer JOIN: FULL JOIN
lets you combine both LEFT JOIN
and RIGHT JOIN
into a single compilation -
The respective query would be -
SELECT s1.id, s1.name, s2.stream
FROM student_name AS s1
FULL JOIN student_stream AS s2
ON s1.id = s2.id;
Note that you can execute all the above OUTER JOIN
queries using the USING
keyword. Let's now study CROSS JOIN
.
CROSS JOIN
CROSS JOIN
is essentially the cartesian product between two elements expressed using SQL. Suppose, you need to have all the combinations possible in between two tables or even in a single table. You will need CROSS JOIN
for doing this. The following figure presents this concept visually -
You already have two tables to test this out. In order to have all possible combinations between the id
columns of the student_name
and student_stream
tables, you can execute the following query -
SELECT s1.id, s2.id
FROM student_name AS s1
CROSS JOIN student_stream AS s2;
And you get the following result -
Let's now see the other two types of join for which PostgreSQL does not provide any direct keywords.
Semi-join and Anti-join
Let's consider the tables that you created earlier in the tutorial:
Semi Join
queries are generally executed in the form of subqueries where rows are picked up from the first table with respect to a condition (or a set of conditions) that is matched in the second table. Let's assume the left table is the first table, and the right table is the second table.
Anti Join
queries are the exact opposite. In Anti Join
, rows are picked up from the first table with respect to a condition (or a set of conditions) that is not matched in the second table. Here is a diagram for you to understand this visually -
The query for realizing the Semi Join
would be -
select id, name
from student_name
where id IN
(select id from student_stream where stream
IN ('CS', 'IT', 'ECE'));
Similarly, the query that realizes Anti Join
, in this case, would be -
select id, name
from student_name
where id NOT IN
(select id from student_stream where stream
IN ('CS', 'IT', 'ECE'));
Note the use of NOT
the above query.
Taking your SQL Join wizardry to the next level
That is all for this tutorial. You studied about a bunch different types SQL Joins in this tutorial and also made yourself familiar to their PostgreSQL syntaxes. If you want to practice some challenging SQL Join exercises, you should take DataCamp's Joining Data in SQL course. In fact, materials from the course were used as references to make this tutorial.
Let me know your questions in the Comments
section.
Courses in SQL
course
Introduction to SQL
course
Intermediate SQL
cheat-sheet
SQL Joins Cheat Sheet
tutorial
Joins in SQL Tutorial
DataCamp Team
5 min
tutorial
Introduction to Indexing in SQL
tutorial
SQL Database Overview Tutorial
DataCamp Team
3 min
code-along
Getting Started in SQL
Kelsey McNeillie
code-along