Course
Structured Query Language (SQL) is one of the most important programming languages in the world and is heavily used by data scientist or data engineers.
As more people turn to data to drive their business, demand for SQL expertise has only increased over the years. Fortunately, the language is easy-to-learn and well established, which means it has a large community of users to support you if you get stuck.
In this article, we will cover a specific clause from the SQL language called QUALIFY. By the end of this article, you will know:
- The basic concept and syntax for QUALIFY
- When to use the QUALIFY clause
- How it differs from other filtering methods.
Associate Data Engineer in SQL
Basic Concept and Syntax of the SQL QUALIFY Clause
QUALIFY is a clause used to filter the results of a window function. Therefore, to successfully use the QUALIFY clause, there must be at least one WINDOW function in the SELECT list or QUALIFY clause – only the rows where the boolean expression evaluates to TRUE will be returned.
Tip: Check out the SQL Window Functions Cheat Sheet.
If there was no QUALIFY clause, filtering on the results of a WINDOW function would require nesting. SQL developers typically recommend against using nested queries as much as possible because they make the code less human-readable while increasing the complexity of debugging and collaboration.
In other words, the main idea behind the QUALIFY clause was to simplify queries that require filtering on the result of window functions; this means QUALIFY is evaluated after window functions are computed.
Here’s the typical order of execution for a query with a QUALIFY statement clause:
- FROM
- WHERE
- GROUP BY and Aggregation
- HAVING
- WINDOW
- QUALIFY
- DISTINCT
- ORDER BY
- LIMIT
The general syntax of a QUALIFY statement is as follows:
QUALIFY <predicate>
In its general form, it would look something like this:
SELECT <column_list>
FROM <data_source>
[GROUP BY ...]
[HAVING ...]
QUALIFY <predicate>
[ ... ]
Note: <predicate> is an expression used to filter the result after aggregations and window functions are computed.
When to Use SQL QUALIFY
The QUALIFY statement clause is much like the HAVING clause in the sense that it avoids the need for a subquery to perform filtering. For example, you may use QUALIFY to filter the results of an analytic function – a function used to calculate an aggregate value based on a group of rows.
To make it clearer when exactly you would use QUALIFY, here is a short scenario:
Let’s say you’re working as a data scientist, and a stakeholder asks you for each customer's last login information. Your initial query may look something like this:
SELECT
user_id,
ip,
country_code,
os,
RANK() over (
PARTITION BY user_id ORDER BY log_datetime DESC
) as previous_logins
FROM login_logs
WHERE TRUE
This is a good start, but it does not fulfill the stakeholder's requests as it will return all of the logins and not just their most recent one. To return the last login, we need to add a filter.
Newcomers to SQL may fall into the trap of attempting to filter this using the WHERE statement clause, as follows:
SELECT
user_id,
ip,
country_code,
os,
RANK() over (
PARTITION BY user_id ORDER BY log_datetime DESC
) AS previous_logins
FROM login_logs
WHERE TRUE
AND last_login = 1
Unfortunately, this code will not execute; the same is true if you attempt GROUP BY and HAVING.
The reason this occurs is all down to the order of operations we discussed in the previous section.
WHERE, GROUP BY, and HAVING are all evaluated before WINDOW functions; this means they can not filter WINDOW functions because they don’t know they exist – the WINDOW function has not been evaluated at the time those three clauses are run.
One way to make the filter come after the WINDOW function is to use a Common Table Expression (CTE).
Here’s how our code will look:
WITH
logins AS (
SELECT
user_id,
ip,
country_code,
os,
RANK() OVER (
PARTITION BY user_id ORDER BY log_datetime DESC
) AS previous_logins
FROM login_logs
WHERE TRUE
)
SELECT user_id, ip, country_code, os
FROM logins
WHERE previous_logins = 1
Technically speaking, this code is valid – It will execute perfectly fine and return the results you’re stakeholder is after.
However, we had to introduce two queries, and there are some extra lines of code, which can become redundant if you’re repeating this process regularly.
The best workaround for this problem is to use QUALIFY.
Here’s how the code will look:
-- Starter code from @Jiho Choi on StackOverflow
SELECT
user_id,
ip,
country_code,
os,
RANK() over (
PARTITION BY user_id ORDER BY log_datetime DESC
) as previous_logins
FROM login_logs
WHERE TRUE
QUALIFY previous_logins = 1
This works solution works because QUALIFY clauses are evaluated after WINDOW functions in SQLs order of operations, which means they are aware of their existence so they can filter them in the same query.
You may be wondering what the benefits QUALIFY are other than writing fewer lines of code, and the answer is that there aren’t really any performance benefits.
The CTE solution we used and the QUALIFY solution both execute in similar timeframes, so we can not say there’s much of a performance enhancement from the QUALIFY solution.
The main benefit is the quality of life improvement; there’s less code, and easier to read.
Comparing SQL QUALIFY With Other Filtering Methods
By now, you understand the QUALIFY statement clause is another filtering method available in SQL. You also know the best time to use the QUALIFY clause.
But what’s the difference between QUALIFY and the other filtering methods?
Let’s recap each filtering method.
The WHERE clause
WHERE is used to filter records in a table; this means it implements row operations. We use it when we want to extract the records that meet a specific condition. For example, we may want to filter a dataset by someone's age. Here’s how it will look in code:
SELECT name, gender, height, weight, age
FROM gym_members_info
WHERE age < 23;
Here are the main things to note about WHERE clause.
- It’s used to filter records based on a specific condition.
- Operations are implemented on the rows.
- You can use WHERE clause without a GROUP BY clause
- WHERE can be used with SELECT, UPDATE, DELETE statements.
Let’s compare this to the HAVING clause.
The HAVING clause
HAVING is used to filter records from groups defined by a GROUP BY clause based on a specific condition. Thus, the most telling difference between WHERE and HAVING is that HAVING requires a GROUP BY clause to be present for it to be evaluated successfully.
The main benefit of this is you can apply your HAVING clause to a subset of aggregated groups, which is not possible in a WHERE block.
For example:
SELECT gender, AVG(height)
FROM gym_members_info
GROUP BY gender
HAVING AVG(height) > 170
The code above will return the gender groups where the average height is greater than 170 centimeters; in other words, HAVING implements a column operation.
Another thing to be aware of is that HAVING can only be used with the SELECT statement, unlike WHERE.
The QUALIFY clause
QUALIFY is the equivalent of HAVING except it performs filters on the results of WINDOW functions, whereas HAVING performs filters on aggregate functions and GROUP BY clauses – learn more about How to use GROUP BY.
If we wanted to know information about the tallest person within each age group, we could use a WINDOW function to perform the calculation and a QUALIFY clause to filter the results so we only discover the tallest person from each age group.
Here’s how it would look in code:
SELECT name, gender, height, weight, age
RANK() over (
PARTITION BY age ORDER BY height DESC
) as ranked_ages
FROM gym_members_info
WHERE TRUE
QUALIFY ranked_ages = 1
Essentially, the QUALIFY clause specifies a conditional expression used to filter the output of an ordered analytical function that has already been computed in accordance with user-specified criteria.
Conclusion and Further Study
SQL plays a major role in the typical data science workflow; knowing how to implement useful operations in SQL, like the QUALIFY clause, is extremely helpful as it enables you to write cleaner code, which makes it more pleasant for others to collaborate with you.
If you would like to master your SQL skills, we recommend you check out the resources below:
- SQL Basics Cheat Sheet
- Introduction to SQL course
- SQL Fundamentals skill track
Become a Data Engineer
