Skip to main content

Introduction to SQL Joins

In this tutorial, you'll learn about the mechanics of joins in SQL and its different types.
Mar 2019  · 9 min read

During your data analysis in SQL, you will often need to look at multiple tables and join them to get the desired results. Joining tables for analyzing data is a required skill for a data scientist as well. In this tutorial, you will learn about different types of joins that you can perform in PostgreSQL. You will first study the preliminary concepts of joins in SQL, then learn about different types of joins and what are their syntaxes in PostgreSQL. You will finish off this tutorial by executing various types of join queries using two tables. To be able to follow along, you should be familiar with the basics of SQL and how to write simple queries in PostgreSQL. If you are looking for resources learn these, the following ones can be useful -

Let's begin.

Setting up the database environment in PostgreSQL

Before you study the basics of SQL joins it would be a good idea to create some tables for your analyses. You will need at least two tables having at least one column in common. Although you can execute self-join queries on the same table, let's ignore that fact for the time being.

You can perform all the SQL operations on a tool called pgAdmin which generally comes with the installation of PostgreSQL. Let's open pgAdmin and create a database (give it a name of your choice) to create the tables if you don't have an existing database to proceed with. You can then create the following two tables with the specifications as given -

  • student_name (id, name)
  • student_stream (id, stream)

Note that both the tables have id column in common. You can use the following CREATE statements to create the tables -

  • For student_name -
    CREATE TABLE student_name
    id smallint,
    name "char"[]
  • For student_stream -
    CREATE TABLE student_stream
    id smallint,
    stream "char"[]

Let's now insert some records into the tables.

  • For student_name -
INSERT INTO student_name(id, name) VALUES (1, 'Sayak');

INSERT INTO student_name(id, name) VALUES (2, 'Alex');

INSERT INTO student_name(id, name) VALUES (3, 'Sameer');

INSERT INTO student_name(id, name) VALUES (4, 'Rick');
  • For student_stream -
INSERT INTO student_stream(id, stream) VALUES (1, 'CS');

INSERT INTO student_stream(id, stream) VALUES (1, 'IT');

INSERT INTO student_stream(id, stream) VALUES (2, 'ECE');

INSERT INTO student_stream(id, stream) VALUES (9, 'ECE');

The table entries should be like the following in case you executed the above queries without changing anything -

student_name and student_stream tables

You now have two simple tables ready. Let's now proceed towards studying the basics of SQL joins.

SQL Joins - Basics

SQL Joins allow you to collate two or more (sometimes just one table) tables using common identifiers. Take the example of the above two tables that you created - both the tables have id column in common. You may question the need of joining in SQL. Let's discuss this in brief.

Due to normalization constraints, you may not have all the required information present in a single table. Normalization is not only desired but also is necessary in order to maintain consistency, to reduce redundancy and to prevent several insertions and update anomalies. Consider the above two table again. Suppose you wish to find out the streams in which Sayak has enrolled. To get the answer to this question, you will need to join the two tables (sub-queries can also be used, but let's not consider that now) and then proceed accordingly.

For two tables to be joined there has to be something in common between the two. Does this mean both the tables have to have at least one column with the same name? Or what exactly is this condition?

The tables you wish to join may not have a column with the same name, but logically they should be the same, i.e., their data-types should be the same. You cannot join two tables having a column with the same name but different data-types. Let's now study the different types of SQL joins.

Different types of SQL Joins

In this section, you will study several types of SQL Joins -

    • Self-join
  • Semi-join and Anti-join

PostgreSQL provides separate keywords for the types of joins mentioned in all capital letters. Let's study each of them one by one. For this, you will first study the joins visually, and then you will execute the respective join queries in PostgreSQL.


Take a look at the following diagram in order to understand INNER JOIN visually -

Inner join diagram

In the above example, the column under consideration is the id column. INNER JOIN will ignore rest of the columns for which the values are not common in both the tables. Let's now execute a query in PostgreSQL which would perform INNER JOIN between the two tables - student_name and student_stream.

Upon executing the following query, you get the result as presented in the above figure -

FROM student_name AS s1
INNER JOIN student_stream AS s2
ON =;

s1 and s2 are the aliases for the two tables. You used the ON keyword along with INNER JOIN. The query can be executed using the USING keyword as well -

FROM student_name AS s1
INNER JOIN student_stream AS s2
USING (id);

Self Join allows you to perform joining the same table. But where this might be a good option? Suppose you have a table consisting of the following columns -

  • Country
  • Continent

Now you want to enlist two countries in a way that their continents are the same. The following figure should give you a sense of the expected results -

self join result (table form)


OUTER JOIN can further be divided into three types -

LEFT JOIN or Left Outer JOIN: The following diagram gives you a fair idea of LEFT JOIN in SQL -

Diagram of left join in SQL

Note that unlike INNER JOIN, LEFT JOIN fetches you the records from the left (w.r.t to the order you specify in your query) table for which there was not any matching entry in the right table. This, in turn, tells you that Sameer and Rick have not enrolled in any streams. The respective query for this will be -

FROM student_name AS s1
LEFT JOIN student_stream AS s2
ON =;

RIGHT JOIN or Right Outer JOIN : RIGHT JOIN is exactly the opposite of LEFT JOIN -

diagram of right join in SQL

RIGHT JOIN can help you in finding the stream(s) for which no student has enrolled. The query for this would be -

FROM student_name AS s1
RIGHT JOIN student_stream AS s2
ON =;

FULL JOIN or Full Outer JOIN: FULL JOIN lets you combine both LEFT JOIN and RIGHT JOIN into a single compilation -

Diagram of full join in SQL

The respective query would be -

FROM student_name AS s1
FULL JOIN student_stream AS s2
ON =;

Note that you can execute all the above OUTER JOIN queries using the USING keyword. Let's now study CROSS JOIN.


CROSS JOIN is essentially the cartesian product between two elements expressed using SQL. Suppose, you need to have all the combinations possible in between two tables or even in a single table. You will need CROSS JOIN for doing this. The following figure presents this concept visually -

Diagram of cross join

You already have two tables to test this out. In order to have all possible combinations between the id columns of the student_name and student_stream tables, you can execute the following query -

FROM student_name AS s1
CROSS JOIN student_stream AS s2;

And you get the following result -

All possible combinations table

Let's now see the other two types of join for which PostgreSQL does not provide any direct keywords.

Semi-join and Anti-join

Let's consider the tables that you created earlier in the tutorial:

Original table

Semi Join queries are generally executed in the form of subqueries where rows are picked up from the first table with respect to a condition (or a set of conditions) that is matched in the second table. Let's assume the left table is the first table, and the right table is the second table.

Anti Join queries are the exact opposite. In Anti Join, rows are picked up from the first table with respect to a condition (or a set of conditions) that is not matched in the second table. Here is a diagram for you to understand this visually -

Semi join and anti join diagrams

The query for realizing the Semi Join would be -

select id, name
from student_name
where id IN
(select id from student_stream where stream
IN ('CS', 'IT', 'ECE'));

Similarly, the query that realizes Anti Join, in this case, would be -

select id, name
from student_name
where id NOT IN
(select id from student_stream where stream
IN ('CS', 'IT', 'ECE'));

Note the use of NOT the above query.

Taking your SQL Join wizardry to the next level

That is all for this tutorial. You studied about a bunch different types SQL Joins in this tutorial and also made yourself familiar to their PostgreSQL syntaxes. If you want to practice some challenging SQL Join exercises, you should take DataCamp's Joining Data in SQL course. In fact, materials from the course were used as references to make this tutorial.

Let me know your questions in the Comments section.

Courses in SQL

Joining Data in SQL

4 hr
Level up your SQL knowledge and learn to join tables together, apply relational set theory, and work with subqueries.
See DetailsRight Arrow
Start Course
See MoreRight Arrow

20 Data Analytics Projects for All Levels

Explore our list of data analytics projects for beginners, final-year students, and professionals. The list consists of guided/unguided projects and tutorials with source code.
Abid Ali Awan's photo

Abid Ali Awan

17 min

Gary Wolf- Dataframed 119.png

Data-Driven Thinking for Everyday Life

Gary Wolf talks about what The Quantified Self is, why self-tracking projects can be life-changing, how to get started with self-tracking, and how to connect with others.

Richie Cotton's photo

Richie Cotton

55 min

Reshaping Data with pandas in Python

Pandas DataFrames are commonly used in Python for data analysis, with observations containing values or variables related to a single object and variables representing attributes across all observations.
Richie Cotton's photo

Richie Cotton

Reshaping Data with tidyr in R

In this cheat sheet, you will learn how to reshape data with tidyr. From separating and combining columns, to dealing with missing data, you'll get the download on how to manipulate data in R.
Richie Cotton's photo

Richie Cotton

6 min

Data Quality Dimensions Cheat Sheet

In this cheat sheet, you'll learn about data quality dimensions, allowing you to ensure that your data is fit for purpose.
Joe Franklin's photo

Joe Franklin

3 min

Top Techniques to Handle Missing Values Every Data Scientist Should Know

Explore various techniques to efficiently handle missing values and their implementations in Python.
Zoumana Keita 's photo

Zoumana Keita

15 min

See MoreSee More