Blog

Set Operators in SQL: A Comprehensive Guide

Set operations in SQL are techniques for combining or comparing the results of two or more SELECT statements.

May 2024 · 9 min read

Set operations form the foundation of SQL and enable us to combine, compare, and filter data from multiple sources.

These operations are indispensable for tasks ranging from data integration and cleansing to advanced analytics and reporting.

In this tutorial, we’ll learn what set operators are, how they are used in SQL, their practical applications, and more!

If you’re looking for a complete SQL learning resource, check out this seven-course SQL Fundamentals skill track.

For those of you in a hurry, let’s start with a very short answer on what set operations in SQL are.

Short Answer: What Are Set Operations in SQL?

Set operations in SQL are techniques for combining or comparing the results of two or more SELECT statements. They act like mathematical set operations, letting us find the union, intersection, or difference between the rows returned by our queries. This makes them indispensable when analyzing data from multiple sources or perspectives.

Here's a quick overview of the core set operations:

UNION: Merges all unique rows from two or more SELECT statements, eliminating duplicates.
UNION ALL: Merges all rows from two or more SELECT statements, keeping duplicates.
INTERSECT: Returns only the rows that appear in both SELECT statements.
EXCEPT: Returns rows from the first SELECT statement that don't appear in the second.

Understanding Set Operators

Set operators are specialized commands or symbols used to perform operations on the result sets of multiple SELECT queries. They enable us to perform tasks like finding the union (all rows), intersection (shared rows), and difference (unique rows) between different datasets.

Set operators are essential tools for data manipulation and analysis within SQL. They provide a powerful means of querying and processing data from relational databases.

Comparison with relational algebra operations

Relational algebra is a theoretical framework that provides a foundation for understanding database queries. It offers abstract operations like projection, selection, and join, rooted in mathematical principles and independent of specific database systems. Think of it as the "behind-the-scenes" logic that powers our database interactions.

Set operators in SQL offer a practical implementation of these concepts within a database environment. They enable us to perform set operations like union, intersection, and difference directly on the result sets of SQL queries.

While relational algebra provides a formal foundation for database operations, set operators in SQL offer a standardized, user-friendly interface for data manipulation tasks.

Understanding the relationship between set operators and relational algebra operations provides insights into the theoretical underpinnings of SQL. It enables users to leverage these concepts effectively in database querying and analysis tasks.

Types of Set Operators in SQL

There are three primary set operators in SQL:

UNION
INTERSECT
EXCEPT (or MINUS in some dialects)

These operators mathematically correspond to the concepts of union, intersection, and set difference.

Let’s dive into more detail about each.

The UNION operator

The UNION operator combines the results of two or more SELECT queries into a single result set, removing duplicate rows by default.

For example, suppose we have two tables, employees and contractors, each with similar columns such as contractors, department, and salary. For learning purposes, let’s consider these two dummy tables:

name	department	salary
Alice	Marketing	65000
Bob	Sales	70000
Carol	Engineering	80000
John	HR	55000

Employees

name	department	salary
David	Marketing	60000
Eva	Sales	68000
Carol	Engineering	75000

Contractors

We can combine the results from both tables using the following command:

-- Using INTERSECT to find common employees
SELECT name, department FROM employees
INTERSECT
SELECT name, department FROM contractors;

This query selects the name, department, and salary columns from both the employees and contractors tables and combines them into a single result set. The UNION operator automatically removes duplicate rows from the final result set.

name	department	salary
Alice	Marketing	65000
Bob	Sales	70000
Carol	Engineering	80000
John	HR	55000
David	Marketing	60000
Eva	Sales	68000

Notice that Carol, who appears in both tables, is only listed once in the result. If we wanted to keep both instances of Carol (with her different salaries), we would use UNION ALL.

It's important to remember that the UNION operator does not remove NULL values. If a column contains NULL values in one result set and non-NULL values in the corresponding column of another result set, the NULL values will be retained in the final result set produced by the UNION operator.

If we want to include NULL values in the result set and prevent their removal by the UNION operator, we can use the UNION ALL operator instead. This operator combines the results of multiple SELECT queries, including all rows from each result set, regardless of whether they are duplicates or contain NULL values.

The INTERSECT operator

The INTERSECT operator returns only the rows that appear in both result sets. Think of it as finding the people who belong to both groups.

Let’s use INTERSECT to query our tables above. For the sake of the example, let’s just query the name and department columns:

-- Using INTERSECT to find common employees
SELECT name, department FROM employees
INTERSECT
SELECT name, department FROM contractors;

This query selects the name and department columns from both the employees and contractors tables and returns only the rows that exist in both tables based on all selected columns.

name	department
Carol	Engineering

The INTERSECT operator handles NULL values based on standard comparison rules, considering NULL values equal when comparing corresponding columns. It also results in an empty set when dealing with empty result sets.

In other words, if a NULL value is present in one result set and the corresponding column in the other result set contains a non-NULL value, the rows are not considered equal – they will not be included in the intersection result.

Additionally, If one of the result sets provided to the INTERSECT operator is empty (i.e., it contains no rows), the overall result of the INTERSECT operation will also be empty since there are no common rows between an empty set and any other set.

The EXCEPT (MINUS) operator

The EXCEPT operator retrieves the rows present in the first result set but not in the second.

For example, let’s say we execute the following query:

-- Using EXCEPT to find employees who are not contractors
SELECT name, department, salary FROM employees
EXCEPT
SELECT name, department, salary FROM contractors;

The name, department, and salary columns are selected from the employees table and return only the rows that do not exist in the contractors table.

name	department	salary
Alice	Marketing	65000
Bob	Sales	70000
John	HR	55000

The EXCEPT operator also follows standard comparison rules for handling NULL values. Its behavior with empty result sets results in an empty set if the first result set is empty or includes all rows from the first result set if the second result set is empty.

Set Operators: Performance and Optimization

The impact of set operators on query performance in SQL can vary depending on factors such as the size of the datasets involved, the complexity of the queries, and the database management system (DBMS) used.

Let's break down the key factors and strategies for optimization.

Data volume and query complexity

When working with large amounts of data, set operators can significantly impact query performance because the size of the result sets that need to be combined, intersected, or compared increases the processing time required to perform that operation.

Complex queries containing multiple subqueries, joins, or set operators may result in additional processing overhead and impact query performance. Chained operations or nested set operations could further exacerbate performance consequences.

Indexing and optimization techniques

Proper indexing of the columns involved in set operations can significantly improve query performance. Indexes help the database engine quickly locate and retrieve the relevant rows, reducing the need for full-table scans and improving query execution times.

To enhance the performance of queries involving set operators, database administrators, and developers can utilize optimization techniques like query rewriting, query plan analysis, and database schema optimization. Techniques like query caching and materialized views can also be used to precompute and store the results of complex queries, reducing the computational overhead of set operations.

Database engine and hardware resources

The performance of set operations may vary depending on the underlying database engine and its optimization capabilities. Different DBMSs may employ different optimization strategies and algorithms for processing set operations, leading to variations in performance.

The availability of hardware resources such as CPU, memory, and disk I/O also influences the performance of queries involving set operators. Adequate hardware resources can help mitigate performance bottlenecks and ensure efficient query execution.

Set Operators in Practice: A Case Study

Set operators aren't just theoretical tools; they have real-world applications that can significantly impact business decisions. Let's walk through a simplified example of how a company might use set operators to segment its customer base for targeted marketing campaigns.

The scenario

Imagine a company that sells both online and in physical stores. They have two separate datasets:

Online purchases: Customer ID, purchase history, demographics, and location for online shoppers.
In-store transactions: Similar information for customers who shopped in person.

Using set operators

To get a complete picture of all customers, the company would first use UNION to combine both datasets into a single table, removing any duplicates. This gives them a unified view of their entire customer base.

Next, they could use INTERSECT to identify customers who have shopped both online and in-store. This segment is particularly valuable as they are highly engaged with the brand across multiple channels.

To find opportunities for cross-channel promotion, the company could use EXCEPT. For example, SELECT * FROM online_purchases EXCEPT SELECT * FROM in_store_transactions would find customers who have only shopped online but not in stores. The company could then target these customers with promotions encouraging them to visit a physical location.

Beyond segmentation

With these segments identified, the company can go further, refining them based on additional factors like demographics or purchase history. This granular understanding of their customers allows them to tailor marketing campaigns with greater precision.

Limitations and Considerations

When using set operators in SQL, it's essential to consider several limitations and factors that can affect query performance, result accuracy, and overall usability.

Data type compatibility and NULL values

Corresponding columns in the result sets must have compatible data types. Ensure data consistency and compatibility across result sets to avoid errors and unexpected results.

Set operators may treat NULL values differently depending on the DBMS and the specific operator. To avoid errors, developers must understand how NULL values are treated.

Performance impact and duplicate rows

Set operations can significantly impact query performance, especially when dealing with large or complex datasets. Factors like indexing, query optimization, and hardware resources can influence performance. Optimization techniques and performance-tuning strategies are essential to mitigate performance bottlenecks.

By default, set operators remove duplicate rows from the result set. However, in some cases, retaining duplicate rows may be necessary. It's important to understand set operators' behavior regarding duplicate rows and use appropriate techniques to handle duplicates if needed.

Ordering of results and memory constraints

Set operators do not guarantee the order of the results in the final output. If preserving the order of results is essential, additional sorting operations may be required after applying set operators.

Set operations can consume significant memory and resources, particularly when dealing with large datasets. Memory constraints and resource limitations must be considered to avoid performance degradation or system instability.

Complexity, maintainability, and cross-DBMS compatibility

Complex queries involving multiple set operators, subqueries, and joins can be challenging to understand, maintain, and debug. To improve readability and maintainability, queries must be concise, well-documented, and modular.

Set operators may have variations in syntax and behavior across different database management systems (DBMSs). Knowing these differences is vital when writing SQL queries for cross-platform compatibility.

Conclusion

In summary, set operators are core components of SQL and indispensable for efficient data manipulation.

They encompass functions like UNION, INTERSECT, and EXCEPT and empower us with versatile data analysis capabilities.

If you want to learn more, check out this course on Joining Data in SQL.

Author

Kurtis Pykes

What is the difference between UNION ALL and UNION?

How is UNION different from JOIN in SQL?

Are there any performance considerations when using set operations?

Topics

SQL

Data Analysis

Learn more about SQL!

Course

Introduction to SQL

2 hr

664.3K

Learn how to create and query relational databases using SQL in just two hours.

See Details

Start Course

Course

Intermediate SQL

4 hr

210.9K

Accompanied at every step with hands-on practice queries, this course teaches you everything you need to know to analyze data using your own SQL code today!

See Details

Start Course

Course

Applying SQL to Real-World Problems

4 hr

24.3K

Find tables, store and manage new tables and views, and write maintainable SQL code to answer business questions.

See Details

Start Course

cheat sheet

SQL Basics Cheat Sheet

With this SQL cheat sheet, you'll have a handy reference guide to basic querying tables, filtering data, and aggregating data

Richie Cotton

5 min

tutorial

SELECTing Multiple Columns in SQL

Learn how to easily select multiple columns from a database table in SQL, or select all columns from a table in one simple query.

DataCamp Team

3 min

tutorial

Introduction to SQL Joins

In this tutorial, you'll learn about the mechanics of joins in SQL and its different types.

Sayak Paul

9 min

tutorial

Aggregate Functions in SQL

Learn how to use aggregate functions for summarizing results and gaining useful insights about data in SQL.

Sayak Paul

9 min

tutorial

SQL: Reporting and Analysis

Master SQL for Data Reporting & daily data analysis by learning how to select, filter & sort data, customize output, & how you can report aggregated data from a database!

Hafsa Jabeen

37 min

code-along

Getting Started in SQL

Learn how to write basic queries in SQL and find answers to business questions.

Kelsey McNeillie

See More See More

Short Answer: What Are Set Operations in SQL?

Understanding Set Operators

Comparison with relational algebra operations

Types of Set Operators in SQL

The UNION operator

The INTERSECT operator

The EXCEPT (MINUS) operator

Set Operators: Performance and Optimization

Data volume and query complexity

Indexing and optimization techniques

Database engine and hardware resources

Set Operators in Practice: A Case Study

The scenario

Using set operators

Beyond segmentation

Limitations and Considerations

Data type compatibility and NULL values

Performance impact and duplicate rows

Ordering of results and memory constraints

Complexity, maintainability, and cross-DBMS compatibility

Conclusion

SQL Operators FAQs

Are there any performance considerations when using set operations?

SQL Basics Cheat Sheet

SELECTing Multiple Columns in SQL

Introduction to SQL Joins

Aggregate Functions in SQL

SQL: Reporting and Analysis

Getting Started in SQL

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Introduction to SQL

Intermediate SQL

Applying SQL to Real-World Problems

SQL Basics Cheat Sheet

SELECTing Multiple Columns in SQL

Introduction to SQL Joins

Aggregate Functions in SQL

SQL: Reporting and Analysis

Getting Started in SQL

Introduction to SQL