Tutorials
data analysis
+2

Introduction to the Where Clause in SQL

In this tutorial, you will be introduced to filtering rows in SQL using the where clause.

Filtering rows from a table with respect to a given condition or a set of conditions is a very common practice in data analysis. In SQL, you can do this using the where clause. In this tutorial, you will be using it for a number of different scenarios.

You will use PostgreSQL as the RDBMS (Relational Database Management System), but the syntaxes do not vary much when it comes to using other database systems. You will start off by restoring a PostgreSQL database with a relevant table to work upon, and then you will execute several filter queries incorporating the where clause.

Note: To follow along with the tutorial, you must be familiar with the basics of SQL and PostgreSQL. Following resources might be helpful if you are not familiar with them.

Restoring a PostgreSQL database

Almost every RDBMS allows us to take a backup copy of a particular database and restore it. For this tutorial, you will use this backup file into a PostgreSQL database. In PostgreSQL, database backups as exported as .backup files. The given DB backup has a table within itself which would be used throughout this tutorial. To restore the backup, you will need to first create a database in PostgreSQL. So, open up the pgAdmin tool and create a new PostgreSQL database. Give it a name of your choice.

After the database is created, right-click on it and select Restore. The following dialog-box should open up -

Keep the settings as shown in the above figure. You will have to navigate to the path where the DB backup file is located. After this, click on the Restore option. The backup would get restored into the database you selected. Under the Tables section, you should see an entry for the table countries -

Feel free to execute a select query on the table to see its contents.

Inspecting the countries table

Following is a snapshot of the countries table you just restored -

The table has different information about a total of 206 countries around the globe. The table has the following columns -

  • code which is the primary key of the table and denotes the universal country code for a given country
  • name i.e., name of a country
  • continent to which a country belongs
  • region viz. the geographical location where a country is located
  • surface_area of the country
  • indep_year i.e., the year in which a country got its independence
  • local_name of a country
  • gov_form i.e., the kind of government that is active in a country
  • capital of a country
  • cap_long i.e., longitude of the capital of a country
  • cap_lat i.e., latitude of the capital of a country

Let's now start filtering some records with the where clause.

Where clause examples

In SQL, where clause lets you filter records based on both textual and numeric values. You can compare these values, and the following comparison-based operations are supported -

  • = equal
  • <> and != not equal
  • < less than
  • > greater than
  • <= less than or equal to
  • >= greater than or equal to

It's advised to use <> rather than != as per the SQL standards.

Let's now get the record of the country where the code is AUS. The column you should compare here is the code column and the value to be compared is AUS. The query would be -

select * from countries where code = 'AUS';

And you get -

Notice a couple of things here -

  • The where clause always comes after from.
  • In SQL, the strings quoted using single quotes.

Let's now see a numerical comparison. Let's get the records of the country which gained their independence (indep_year) after 1900 -

select * from countries where indep_year > 1901;

You get the records of a total of 145 countries which gained their independence after 1900. Take note of the comparator used here. The indep_yearcolumn is an integer in type. Hence, no quotes in the comparison.

You can connect multiple conditions in a where clause using AND and OR connectives. Following query gets you the records of those countries where continent = 'Asia' and indep_year > 1900 -

select * from countries where continent = 'Asia' and indep_year > 1900;

You get information about a total of 42 countries which are situated in the continent of Asia and gained their independence after 1900. Note that in the above query, you compared both textual and numerical values. As an exercise, you play with different combinations of AND and OR in the where clause.

Note: When you are combining AND and OR, don't forget to enclose each of the condition within parentheses. Otherwise, SQL will get confused to determine of the precedence the evaluation of the conditions.

With the following query, you get a total of 66 countries with condition as where (continent = 'Europe' and indep_year > 1900) or (continent = 'Asia' and indep_year > 1900).

select * from countries where
(continent = 'Europe' and indep_year > 1900)
or
(continent = 'Asia' and indep_year > 1900);

You can even connect more than two conditions in a where clause -

select * from countries where continent = 'Asia' or indep_year > 1900 or surface_area > 652090;

You get a total of 165 countries that are either situated in Asia continent or gained their independence after 1900 or having a surface area greater than 652090. Let's now convert the query into its AND variant where each of the conditions will be connected using AND.

select * from countries where continent = 'Asia' and indep_year > 1900 and surface_area > 652090;

You will get a total of 9 countries in this case.

If you are comparing the same column value for more than once you will have to specify the column name each time. Refer the following query -

select * from countries where indep_year = 1947 or indep_year = 1912;

But the query select * from countries where indep_year = 1947 or 1912; is wrong.

You can pair up sub-queries in a where clause as well. Let's say you want to fetch the record of the country having the highest surface area, but you don't know what the maximum surface area obtained by a country is. The following query can come to rescue in situations like this -

select * from countries where surface_area in (select max(surface_area) from countries);

You should get the record of Russia.

Conclusion

Congrats, you have completed this tutorial. Filtering query results with respect to conditions is definitely a vital skill, and this tutorial served you as an introduction to it. If you want to hone in your SQL skills even further, then you can check out the following DataCamp courses -

Want to leave a comment?