Introduction to SQL Joins
During your data analysis in SQL, you will often need to look at multiple tables and join them to get the desired results. Joining tables for analyzing data is a required skill for a data scientist as well. In this tutorial, you will learn about different types of joins that you can perform in PostgreSQL. You will first study the preliminary concepts of joins in SQL, then learn about different types of joins and what are their syntaxes in PostgreSQL. You will finish off this tutorial by executing various types of join queries using two tables. To be able to follow along, you should be familiar with the basics of SQL and how to write simple queries in PostgreSQL. If you are looking for resources learn these, the following ones can be useful -
Setting up the database environment in PostgreSQL
Before you study the basics of SQL joins it would be a good idea to create some tables for your analyses. You will need at least two tables having at least one column in common. Although you can execute self-join queries on the same table, let's ignore that fact for the time being.
You can perform all the SQL operations on a tool called
pgAdmin which generally comes with the installation of PostgreSQL. Let's open
pgAdmin and create a database (give it a name of your choice) to create the tables if you don't have an existing database to proceed with. You can then create the following two tables with the specifications as given -
- student_name (id, name)
- student_stream (id, stream)
Note that both the tables have
id column in common. You can use the following
CREATE statements to create the tables -
- For student_name -
CREATE TABLE student_name ( id smallint, name "char" );
- For student_stream -
CREATE TABLE student_stream ( id smallint, stream "char" );
Let's now insert some records into the tables.
- For student_name -
INSERT INTO student_name(id, name) VALUES (1, 'Sayak'); INSERT INTO student_name(id, name) VALUES (2, 'Alex'); INSERT INTO student_name(id, name) VALUES (3, 'Sameer'); INSERT INTO student_name(id, name) VALUES (4, 'Rick');
- For student_stream -
INSERT INTO student_stream(id, stream) VALUES (1, 'CS'); INSERT INTO student_stream(id, stream) VALUES (1, 'IT'); INSERT INTO student_stream(id, stream) VALUES (2, 'ECE'); INSERT INTO student_stream(id, stream) VALUES (9, 'ECE');
The table entries should be like the following in case you executed the above queries without changing anything -
You now have two simple tables ready. Let's now proceed towards studying the basics of SQL joins.
SQL Joins - Basics
SQL Joins allow you to collate two or more (sometimes just one table) tables using common identifiers. Take the example of the above two tables that you created - both the tables have
id column in common. You may question the need of joining in SQL. Let's discuss this in brief.
Due to normalization constraints, you may not have all the required information present in a single table. Normalization is not only desired but also is necessary in order to maintain consistency, to reduce redundancy and to prevent several insertions and update anomalies. Consider the above two table again. Suppose you wish to find out the streams in which
Sayak has enrolled. To get the answer to this question, you will need to join the two tables (sub-queries can also be used, but let's not consider that now) and then proceed accordingly.
For two tables to be joined there has to be something in common between the two. Does this mean both the tables have to have at least one column with the same name? Or what exactly is this condition?
The tables you wish to join may not have a column with the same name, but logically they should be the same, i.e., their data-types should be the same. You cannot join two tables having a column with the same name but different data-types. Let's now study the different types of SQL joins.
Different types of SQL Joins
In this section, you will study several types of SQL Joins -
- INNER JOIN
- OUTER JOIN
- LEFT JOIN
- RIGHT JOIN
- FULL JOIN
- CROSS JOIN
- Semi-join and Anti-join
PostgreSQL provides separate keywords for the types of joins mentioned in all capital letters. Let's study each of them one by one. For this, you will first study the joins visually, and then you will execute the respective join queries in PostgreSQL.
Take a look at the following diagram in order to understand
INNER JOIN visually -
In the above example, the column under consideration is the
INNER JOIN will ignore rest of the columns for which the values are not common in both the tables. Let's now execute a query in PostgreSQL which would perform
INNER JOIN between the two tables - student_name and student_stream.
Upon executing the following query, you get the result as presented in the above figure -
SELECT s1.id, s1.name, s2.stream FROM student_name AS s1 INNER JOIN student_stream AS s2 ON s1.id = s2.id;
s2 are the aliases for the two tables. You used the
ON keyword along with
INNER JOIN. The query can be executed using the
USING keyword as well -
SELECT s1.id, s1.name, s2.stream FROM student_name AS s1 INNER JOIN student_stream AS s2 USING (id);
Self Join allows you to perform joining the same table. But where this might be a good option? Suppose you have a table consisting of the following columns -
Now you want to enlist two countries in a way that their continents are the same. The following figure should give you a sense of the expected results -
OUTER JOIN can further be divided into three types -
LEFT JOIN or Left Outer JOIN: The following diagram gives you a fair idea of
LEFT JOIN in SQL -
Note that unlike
LEFT JOIN fetches you the records from the left (w.r.t to the order you specify in your query) table for which there was not any matching entry in the right table. This, in turn, tells you that
Rick have not enrolled in any streams. The respective query for this will be -
SELECT s1.id, s1.name, s2.stream FROM student_name AS s1 LEFT JOIN student_stream AS s2 ON s1.id = s2.id;
RIGHT JOIN or Right Outer JOIN :
RIGHT JOIN is exactly the opposite of
LEFT JOIN -
RIGHT JOIN can help you in finding the stream(s) for which no student has enrolled. The query for this would be -
SELECT s1.id, s1.name, s2.stream FROM student_name AS s1 RIGHT JOIN student_stream AS s2 ON s1.id = s2.id;
FULL JOIN or Full Outer JOIN:
FULL JOIN lets you combine both
LEFT JOIN and
RIGHT JOIN into a single compilation -
The respective query would be -
SELECT s1.id, s1.name, s2.stream FROM student_name AS s1 FULL JOIN student_stream AS s2 ON s1.id = s2.id;
Note that you can execute all the above
OUTER JOIN queries using the
USING keyword. Let's now study
CROSS JOIN is essentially the cartesian product between two elements expressed using SQL. Suppose, you need to have all the combinations possible in between two tables or even in a single table. You will need
CROSS JOIN for doing this. The following figure presents this concept visually -
You already have two tables to test this out. In order to have all possible combinations between the
id columns of the
student_stream tables, you can execute the following query -
SELECT s1.id, s2.id FROM student_name AS s1 CROSS JOIN student_stream AS s2;
And you get the following result -
Let's now see the other two types of join for which PostgreSQL does not provide any direct keywords.
Semi-join and Anti-join
Let's consider the tables that you created earlier in the tutorial:
Semi Join queries are generally executed in the form of subqueries where rows are picked up from the first table with respect to a condition (or a set of conditions) that is matched in the second table. Let's assume the left table is the first table, and the right table is the second table.
Anti Join queries are the exact opposite. In
Anti Join, rows are picked up from the first table with respect to a condition (or a set of conditions) that is not matched in the second table. Here is a diagram for you to understand this visually -
The query for realizing the
Semi Join would be -
select id, name from student_name where id IN (select id from student_stream where stream IN ('CS', 'IT', 'ECE'));
Similarly, the query that realizes
Anti Join, in this case, would be -
select id, name from student_name where id NOT IN (select id from student_stream where stream IN ('CS', 'IT', 'ECE'));
Note the use of
NOT the above query.
Taking your SQL Join wizardry to the next level
That is all for this tutorial. You studied about a bunch different types SQL Joins in this tutorial and also made yourself familiar to their PostgreSQL syntaxes. If you want to practice some challenging SQL Join exercises, you should take DataCamp's Joining Data in SQL course. In fact, materials from the course were used as references to make this tutorial.
Let me know your questions in the