Skip to main content
HomeTutorialsSQL

Set Operators in SQL: A Comprehensive Guide

Set operations in SQL are techniques for combining or comparing the results of two or more SELECT statements.
May 2024  · 9 min read

Set operations form the foundation of SQL and enable us to combine, compare, and filter data from multiple sources.

These operations are indispensable for tasks ranging from data integration and cleansing to advanced analytics and reporting.

In this tutorial, we’ll learn what set operators are, how they are used in SQL, their practical applications, and more!

If you’re looking for a complete SQL learning resource, check out this seven-course SQL Fundamentals skill track.

For those of you in a hurry, let’s start with a very short answer on what set operations in SQL are.

Short Answer: What Are Set Operations in SQL?

Set operations in SQL are techniques for combining or comparing the results of two or more SELECT statements. They act like mathematical set operations, letting us find the union, intersection, or difference between the rows returned by our queries. This makes them indispensable when analyzing data from multiple sources or perspectives.

Here's a quick overview of the core set operations:

  • UNION: Merges all unique rows from two or more SELECT statements, eliminating duplicates.
  • UNION ALL: Merges all rows from two or more SELECT statements, keeping duplicates.
  • INTERSECT: Returns only the rows that appear in both SELECT statements.
  • EXCEPT: Returns rows from the first SELECT statement that don't appear in the second.

Understanding Set Operators

Set operators are specialized commands or symbols used to perform operations on the result sets of multiple SELECT queries. They enable us to perform tasks like finding the union (all rows), intersection (shared rows), and difference (unique rows) between different datasets.

Set operators are essential tools for data manipulation and analysis within SQL. They provide a powerful means of querying and processing data from relational databases.

Comparison with relational algebra operations

Relational algebra is a theoretical framework that provides a foundation for understanding database queries. It offers abstract operations like projection, selection, and join, rooted in mathematical principles and independent of specific database systems. Think of it as the "behind-the-scenes" logic that powers our database interactions.

Set operators in SQL offer a practical implementation of these concepts within a database environment. They enable us to perform set operations like union, intersection, and difference directly on the result sets of SQL queries.

While relational algebra provides a formal foundation for database operations, set operators in SQL offer a standardized, user-friendly interface for data manipulation tasks.

Understanding the relationship between set operators and relational algebra operations provides insights into the theoretical underpinnings of SQL. It enables users to leverage these concepts effectively in database querying and analysis tasks.

Types of Set Operators in SQL

There are three primary set operators in SQL:

  • UNION
  • INTERSECT
  • EXCEPT (or MINUS in some dialects)

These operators mathematically correspond to the concepts of union, intersection, and set difference.

Let’s dive into more detail about each.

The UNION operator

The UNION operator combines the results of two or more SELECT queries into a single result set, removing duplicate rows by default.

For example, suppose we have two tables, employees and contractors, each with similar columns such as contractors, department, and salary. For learning purposes, let’s consider these two dummy tables:

name

department

salary

Alice

Marketing

65000

Bob

Sales

70000

Carol

Engineering

80000

John

HR

55000

Employees

name

department

salary

David

Marketing

60000

Eva

Sales

68000

Carol

Engineering

75000

Contractors

We can combine the results from both tables using the following command:

-- Using INTERSECT to find common employees
SELECT name, department FROM employees
INTERSECT
SELECT name, department FROM contractors;

This query selects the name, department, and salary columns from both the employees and contractors tables and combines them into a single result set. The UNION operator automatically removes duplicate rows from the final result set.

name

department

salary

Alice

Marketing

65000

Bob

Sales

70000

Carol

Engineering

80000

John

HR

55000

David

Marketing

60000

Eva

Sales

68000

Notice that Carol, who appears in both tables, is only listed once in the result. If we wanted to keep both instances of Carol (with her different salaries), we would use UNION ALL.

It's important to remember that the UNION operator does not remove NULL values. If a column contains NULL values in one result set and non-NULL values in the corresponding column of another result set, the NULL values will be retained in the final result set produced by the UNION operator.

If we want to include NULL values in the result set and prevent their removal by the UNION operator, we can use the UNION ALL operator instead. This operator combines the results of multiple SELECT queries, including all rows from each result set, regardless of whether they are duplicates or contain NULL values.

The INTERSECT operator

The INTERSECT operator returns only the rows that appear in both result sets. Think of it as finding the people who belong to both groups.

Let’s use INTERSECT to query our tables above. For the sake of the example, let’s just query the name and department columns:

-- Using INTERSECT to find common employees
SELECT name, department FROM employees
INTERSECT
SELECT name, department FROM contractors;

This query selects the name and department columns from both the employees and contractors tables and returns only the rows that exist in both tables based on all selected columns.

name

department

Carol

Engineering

The INTERSECT operator handles NULL values based on standard comparison rules, considering NULL values equal when comparing corresponding columns. It also results in an empty set when dealing with empty result sets.

In other words, if a NULL value is present in one result set and the corresponding column in the other result set contains a non-NULL value, the rows are not considered equal – they will not be included in the intersection result.

Additionally, If one of the result sets provided to the INTERSECT operator is empty (i.e., it contains no rows), the overall result of the INTERSECT operation will also be empty since there are no common rows between an empty set and any other set.

The EXCEPT (MINUS) operator

The EXCEPT operator retrieves the rows present in the first result set but not in the second.

For example, let’s say we execute the following query:

-- Using EXCEPT to find employees who are not contractors
SELECT name, department, salary FROM employees
EXCEPT
SELECT name, department, salary FROM contractors;

The name, department, and salary columns are selected from the employees table and return only the rows that do not exist in the contractors table.

name

department

salary

Alice

Marketing

65000

Bob

Sales

70000

John

HR

55000

The EXCEPT operator also follows standard comparison rules for handling NULL values. Its behavior with empty result sets results in an empty set if the first result set is empty or includes all rows from the first result set if the second result set is empty.

Set Operators: Performance and Optimization

The impact of set operators on query performance in SQL can vary depending on factors such as the size of the datasets involved, the complexity of the queries, and the database management system (DBMS) used.

Let's break down the key factors and strategies for optimization.

Data volume and query complexity

When working with large amounts of data, set operators can significantly impact query performance because the size of the result sets that need to be combined, intersected, or compared increases the processing time required to perform that operation.

Complex queries containing multiple subqueries, joins, or set operators may result in additional processing overhead and impact query performance. Chained operations or nested set operations could further exacerbate performance consequences.

Indexing and optimization techniques

Proper indexing of the columns involved in set operations can significantly improve query performance. Indexes help the database engine quickly locate and retrieve the relevant rows, reducing the need for full-table scans and improving query execution times.

To enhance the performance of queries involving set operators, database administrators, and developers can utilize optimization techniques like query rewriting, query plan analysis, and database schema optimization. Techniques like query caching and materialized views can also be used to precompute and store the results of complex queries, reducing the computational overhead of set operations.

Database engine and hardware resources

The performance of set operations may vary depending on the underlying database engine and its optimization capabilities. Different DBMSs may employ different optimization strategies and algorithms for processing set operations, leading to variations in performance.

The availability of hardware resources such as CPU, memory, and disk I/O also influences the performance of queries involving set operators. Adequate hardware resources can help mitigate performance bottlenecks and ensure efficient query execution.

Set Operators in Practice: A Case Study

Set operators aren't just theoretical tools; they have real-world applications that can significantly impact business decisions. Let's walk through a simplified example of how a company might use set operators to segment its customer base for targeted marketing campaigns.

The scenario

Imagine a company that sells both online and in physical stores. They have two separate datasets:

  1. Online purchases: Customer ID, purchase history, demographics, and location for online shoppers.
  2. In-store transactions: Similar information for customers who shopped in person.

Using set operators

To get a complete picture of all customers, the company would first use UNION to combine both datasets into a single table, removing any duplicates. This gives them a unified view of their entire customer base.

Next, they could use INTERSECT to identify customers who have shopped both online and in-store. This segment is particularly valuable as they are highly engaged with the brand across multiple channels.

To find opportunities for cross-channel promotion, the company could use EXCEPT. For example, SELECT * FROM online_purchases EXCEPT SELECT * FROM in_store_transactions would find customers who have only shopped online but not in stores. The company could then target these customers with promotions encouraging them to visit a physical location.

Beyond segmentation

With these segments identified, the company can go further, refining them based on additional factors like demographics or purchase history. This granular understanding of their customers allows them to tailor marketing campaigns with greater precision.

Limitations and Considerations

When using set operators in SQL, it's essential to consider several limitations and factors that can affect query performance, result accuracy, and overall usability.

Data type compatibility and NULL values

Corresponding columns in the result sets must have compatible data types. Ensure data consistency and compatibility across result sets to avoid errors and unexpected results.

Set operators may treat NULL values differently depending on the DBMS and the specific operator. To avoid errors, developers must understand how NULL values are treated.

Performance impact and duplicate rows

Set operations can significantly impact query performance, especially when dealing with large or complex datasets. Factors like indexing, query optimization, and hardware resources can influence performance. Optimization techniques and performance-tuning strategies are essential to mitigate performance bottlenecks.

By default, set operators remove duplicate rows from the result set. However, in some cases, retaining duplicate rows may be necessary. It's important to understand set operators' behavior regarding duplicate rows and use appropriate techniques to handle duplicates if needed.

Ordering of results and memory constraints

Set operators do not guarantee the order of the results in the final output. If preserving the order of results is essential, additional sorting operations may be required after applying set operators.

Set operations can consume significant memory and resources, particularly when dealing with large datasets. Memory constraints and resource limitations must be considered to avoid performance degradation or system instability.

Complexity, maintainability, and cross-DBMS compatibility

Complex queries involving multiple set operators, subqueries, and joins can be challenging to understand, maintain, and debug. To improve readability and maintainability, queries must be concise, well-documented, and modular.

Set operators may have variations in syntax and behavior across different database management systems (DBMSs). Knowing these differences is vital when writing SQL queries for cross-platform compatibility.

Conclusion

In summary, set operators are core components of SQL and indispensable for efficient data manipulation.

They encompass functions like UNION, INTERSECT, and EXCEPT and empower us with versatile data analysis capabilities.

If you want to learn more, check out this course on Joining Data in SQL.


Photo of Kurtis Pykes
Author
Kurtis Pykes

SQL Operators FAQs

What is the difference between UNION ALL and UNION?

UNION ALL includes all rows from both queries, even if there are duplicates. UNION eliminates duplicate rows.

How is UNION different from JOIN in SQL?

UNION combines the results of queries vertically, appending rows from one query to another. JOIN combines tables horizontally, matching rows based on a related column and creating a wider result set.

Are there any performance considerations when using set operations?

Set operations can be computationally expensive, especially when dealing with large datasets. It's important to optimize the individual queries and use indexes where possible to improve performance.

Topics

Learn more about SQL!

Course

Introduction to SQL

2 hr
664.3K
Learn how to create and query relational databases using SQL in just two hours.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

cheat sheet

SQL Basics Cheat Sheet

With this SQL cheat sheet, you'll have a handy reference guide to basic querying tables, filtering data, and aggregating data
Richie Cotton's photo

Richie Cotton

5 min

tutorial

SELECTing Multiple Columns in SQL

Learn how to easily select multiple columns from a database table in SQL, or select all columns from a table in one simple query.
DataCamp Team's photo

DataCamp Team

3 min

tutorial

Introduction to SQL Joins

In this tutorial, you'll learn about the mechanics of joins in SQL and its different types.
Sayak Paul's photo

Sayak Paul

9 min

tutorial

Aggregate Functions in SQL

Learn how to use aggregate functions for summarizing results and gaining useful insights about data in SQL.
Sayak Paul's photo

Sayak Paul

9 min

tutorial

SQL: Reporting and Analysis

Master SQL for Data Reporting & daily data analysis by learning how to select, filter & sort data, customize output, & how you can report aggregated data from a database!
Hafsa Jabeen's photo

Hafsa Jabeen

37 min

code-along

Getting Started in SQL

Learn how to write basic queries in SQL and find answers to business questions.
Kelsey McNeillie's photo

Kelsey McNeillie

See MoreSee More