Course
If you've been working with SQL for a while but haven't used CTEs, you'll likely wonder how you managed without them. I use them pretty much everywhere, including in SELECT, INSERT, UPDATE, and DELETE statements.
In this article, I will go through the basics, including how to create a CTE. I will also go into more advanced things, such as how to differentiate between non-recursive and recursive CTEs, which both serve a purpose.
If you are a bit unfamiliar with SQL operations, try our very popular Introduction to SQL course to get started. The course is well-designed and comprehensive, and it will teach you everything you need to know to extract data using efficient queries.
What is a SQL CTE?
The idea of CTEs will become clear when I show examples. But for now, we can say that a CTE, or common table expression, is a temporary, named result set in SQL that allows you to simplify complex queries, making them easier to read and maintain.
CTEs are commonly used when working with multiple subqueries. You might recognize them because they are created with the distinctive WITH keyword and, like I mentioned, they can be used in SELECT, INSERT, UPDATE, and DELETE statements.
Associate Data Engineer in SQL
How to Create a SQL CTE
When creating a CTE, we use the WITH keyword to initiate the CTE definition. The general syntax of a CTE is as follows:
WITH cte_name (column1, column2, ...) AS (
-- Query that defines the CTE
SELECT ...
FROM ...
WHERE ...
)
-- Main query
SELECT ...
FROM cte_name;
Where:
- WITH: Initiates the CTE definition.
- cte_name: The name assigned to the CTE (used to reference it later).
- Optional column list: Specifies column names for the CTE’s result set.
- Main query: References the CTE by name, treating it like a normal table.
Let us look at an example. Assume we have a table Employees and want to select employees who earn a salary above $50,000.
Step 1: Write the base query
We start by writing the basic SELECT query:
SELECT EmployeeID, FirstName, LastName, Salary
FROM Employees
WHERE Salary > 50000;
Step 2: Wrap the query using the WITH keyword to create a CTE
Use the WITH keyword to give the CTE a name.
WITH HighEarningEmployees AS (
SELECT EmployeeID, FirstName, LastName, Salary
FROM Employees
WHERE Salary > 50000
)
Step 3: Use the CTE in the main query
Finally, refer to the CTE in a SELECT statement by calling the CTE name defined above.
WITH HighEarningEmployees AS (
SELECT EmployeeID, FirstName, LastName, Salary
FROM Employees
WHERE Salary > 50000
)
SELECT EmployeeID, FirstName, LastName
FROM HighEarningEmployees;
To summarize the above steps, we have used the WITH keyword to define the CTE named HighEarningEmployees. The inner query was used to generate the temporary dataset. The main query references the HighEarningEmployees to display the specified columns EmployeeID, FirstName, and LastName.
Why SQL CTEs are Useful
From the above example, you may wonder why we use CTEs when even simple queries yield the same results. The following are the reasons:
Simplify complex queries
CTEs break down complex SQL statements into smaller, more manageable parts, making the code easier to read, write, and maintain.
Suppose we have three tables: Orders, Customers, and Products. We want to find the total revenue generated by each customer who purchased in 2024. When we write the query without using CTE, it looks cluttered and hard to read and understand.
-- Standard SQL: Hard to read nested logic
SELECT
c.CustomerName,
SUM(p.Price * o.Quantity) AS TotalRevenue
FROM Orders o
JOIN Customers c
ON o.CustomerID = c.CustomerID
JOIN Products p
ON o.ProductID = p.ProductID
WHERE EXTRACT(YEAR FROM o.OrderDate) = 2024
GROUP BY c.CustomerName
HAVING SUM(p.Price * o.Quantity) > 1000;
By using a CTE, we can separate the logic into a more readable format. We isolate the "filtering and joining" step first, then perform the aggregation.
-- Standard SQL: Cleaner with CTE
WITH OrderDetails AS (
SELECT
o.OrderID,
c.CustomerName,
p.Price,
o.Quantity,
o.OrderDate
FROM Orders o
JOIN Customers c
ON o.CustomerID = c.CustomerID
JOIN Products p
ON o.ProductID = p.ProductID
WHERE EXTRACT(YEAR FROM o.OrderDate) = 2024
)
-- Main query
SELECT
CustomerName,
SUM(Price * Quantity) AS TotalRevenue
FROM OrderDetails
GROUP BY CustomerName
HAVING SUM(Price * Quantity) > 1000;
Code reusability
CTEs help avoid duplication by allowing the same result set to be reused. If you need to calculate an aggregate (like a sum) and then filter based on that aggregate, a CTE is perfect.
Assume we need to calculate the average and total sales for each product category. We define the calculation once in a CTE:
WITH CategorySales AS (
SELECT
Category,
SUM(SalesAmount) AS TotalSales,
AVG(SalesAmount) AS AverageSales
FROM Products
GROUP BY Category
)
-- Select from the CTE where the pre-calculated TotalSales is high
SELECT
Category,
TotalSales,
AverageSales
FROM CategorySales
WHERE TotalSales > 5000;
Other applications
In addition to simplifying queries and code reusability, CTEs have other uses as well. I'm not able to cover every possible use of CTEs in detail. Our Data Manipulation in SQL course is a great option if you want to keep practicing. However, I'll document some of the main other reasons here:
- Query Organization and Readability: CTEs improve SQL code readability by dividing queries into logical, sequential steps. Each step in the query process can be represented by its own CTE, making the entire query easier to follow.
- Hierarchical Data Traversal: CTEs can help navigate hierarchical relationships, such as organizational structures, parent-child relationships, or any data model that involves nested levels. Recursive CTEs are useful for querying hierarchical data because they allow you to traverse levels iteratively.
- Multi-Level Aggregations: CTEs can help perform aggregations at multiple levels, such as calculating sales figures at different granularities (e.g., by month, quarter, and year). Using CTEs to separate these aggregation steps ensures that each level is calculated independently and logically.
- Combining Data from Multiple Tables: Multiple CTEs can be used to combine data from different tables, making the final combination step more structured. This approach simplifies complex joins and ensures the source data is organized logically for improved readability.
Advanced SQL CTE Techniques
CTEs support advanced SQL techniques, making them versatile and useful for different use cases. The following are some of CTEs' advanced applications.
Multiple CTEs in a single query
You can define multiple CTEs in a single query, which allows for complex transformations and calculations. This method is useful when a problem requires multiple stages of data processing, where each CTE represents a distinct stage.
Suppose we have sales data in a table called Sales and we want to calculate the total sales for each product, identify products with above-average total sales, and rank these products based on their total sales.
WITH ProductSales AS (
-- Step 1: Calculate total sales for each product
SELECT ProductID, SUM(SalesAmount) AS TotalSales
FROM Sales
GROUP BY ProductID
),
AverageSales AS (
-- Step 2: Calculate the average of those totals
-- Note: We can reference the previous CTE (ProductSales) here
SELECT AVG(TotalSales) AS AverageTotalSales
FROM ProductSales
),
HighSalesProducts AS (
-- Step 3: Filter products above the average
SELECT ps.ProductID, ps.TotalSales
FROM ProductSales ps
CROSS JOIN AverageSales av
WHERE ps.TotalSales > av.AverageTotalSales
)
-- Step 4: Rank the results
SELECT
ProductID,
TotalSales,
RANK() OVER (ORDER BY TotalSales DESC) AS SalesRank
FROM HighSalesProducts;
In the above example:
- The first CTE (
ProductSales) calculates the total sales per product. - The second CTE (
AverageSales) computes the average total sales across all products. - The third CTE (
HighSalesProducts) filters for products whose total sales exceed the average. - The final query ranks these products based on their total sales.
CTEs in UPDATE, DELETE, and MERGE Statements
When incorporated into UPDATE, DELETE, and MERGE operations, CTEs can simplify data manipulation tasks, especially when dealing with complex filters or hierarchical data.
Using CTE with an UPDATE statement
Assume we have an Employees table with an EmployeeSalary column. We want to give a 10% raise to all employees who have worked for the company for over 5 years.
-- Define a CTE to find employees hired more than 5 years ago
WITH LongTermEmployees AS (
SELECT EmployeeID
FROM Employees
-- Standard SQL: Compare HireDate to 5 years before today
WHERE HireDate <= CURRENT_DATE - INTERVAL '5' YEAR
)
-- Update salaries by 10% for long-term employees identified in the CTE
UPDATE Employees
SET EmployeeSalary = EmployeeSalary * 1.10
WHERE EmployeeID IN (SELECT EmployeeID FROM LongTermEmployees);
The CTE LongTermEmployees identifies employees who have worked more than five years. The UPDATE statement uses this CTE to selectively increase salaries.
Using CTE with a DELETE statement
Now, assume we have a table named Products and want to delete all products that haven’t been sold in the last 2 years. We can use a CTE to filter the products:
-- Define a CTE to identify products not sold in the last 2 years
WITH OldProducts AS (
SELECT ProductID
FROM Products
-- Standard SQL: Filter for dates older than 2 years ago
WHERE LastSoldDate < CURRENT_DATE - INTERVAL '2' YEAR
)
-- Delete products identified as old from the main table
DELETE FROM Products
WHERE ProductID IN (SELECT ProductID FROM OldProducts);
The CTE OldProducts identifies products that haven't been sold in the last two years, and then the DELETE statement uses this CTE to remove those products.
Using CTE with a MERGE statement
The MERGE statement in SQL allows for conditional updates, inserts, or deletions in a target table based on data in a source table. In the following example, the CTE MergedInventory combines new and existing inventory data. The MERGE statement then updates quantities for existing products or inserts new products based on the CTE data.
-- CTE to prepare the source data for the merge
WITH MergedInventory AS (
SELECT
ni.ProductID,
ni.Quantity AS NewQuantity
FROM NewInventoryData ni
)
-- Merge the prepared data into the Inventory table
MERGE INTO Inventory AS target
USING MergedInventory AS source
ON target.ProductID = source.ProductID
-- Update existing products with new quantities
WHEN MATCHED THEN
UPDATE SET target.Quantity = source.NewQuantity
-- Insert new products if they don't exist in the inventory
WHEN NOT MATCHED THEN
INSERT (ProductID, Quantity)
VALUES (source.ProductID, source.NewQuantity);
Recursive Common Table Expressions (CTEs)
Recursive CTEs are a special type of CTE that reference themselves within their definition, allowing the query to perform repeated operations. This makes them ideal for working with hierarchical data like organizational charts.
Introduction to recursive CTEs
Recursive CTEs are a special type of CTE that references itself within its definition, allowing the query to perform repeated operations. This makes them ideal for working with hierarchical or tree-structured data, such as organizational charts, directory structures, or product assemblies. The recursive CTE iteratively processes data, returning results step by step until the recursive member returns no new rows (the termination condition).
Anchor and recursive members
A recursive CTE consists of two main parts:
- Anchor Member: The part that defines the base query that starts the recursion.
- Recursive Member: The part that references the CTE itself, allowing it to perform the "recursive" operations.
Suppose we have an Employees table, where each row contains an EmployeeID, EmployeeName, and ManagerID. If we want to find all direct and indirect reports for a specific manager, we start with the anchor member identifying the top-level manager.
Note: In Standard SQL (PostgreSQL, MySQL, SQLite), you must use the RECURSIVE keyword.
WITH RECURSIVE EmployeeHierarchy AS (
-- Anchor member: select the top-level manager
SELECT
EmployeeID,
EmployeeName,
ManagerID,
1 AS Level
FROM Employees
WHERE EmployeeID = 1 -- Starting with the top-level manager
UNION ALL
-- Recursive member: find employees who report to the current managers
SELECT
e.EmployeeID,
e.EmployeeName,
e.ManagerID,
eh.Level + 1
FROM Employees e
INNER JOIN EmployeeHierarchy eh
ON e.ManagerID = eh.EmployeeID
)
-- Select the final result from the CTE
SELECT EmployeeID, EmployeeName, Level
FROM EmployeeHierarchy;
How it works:
- Anchor: The query runs the Anchor Member first, finding the employee with ID 1.
- Recursion: The Recursive Member runs, looking for employees whose
ManagerIDmatches theEmployeeIDfound in the previous step. - Loop: This process repeats (Level 1 finds Level 2, Level 2 finds Level 3) until no new employees are found.
Potential Issues or Limitations of CTEs in SQL
Understanding the features and limitations of CTEs is important for writing logical and readable queries. Let us look at some limitations and potential issues of using CTEs in different databases.
SQL Server and Azure limitations
There are some environment-specific limitations for SQL CTEs when working with SQL Server or Azure Synapse Analytics. They include the following:
- SQL Server: The default maximum recursion level for recursive CTEs is 100. If this limit is exceeded without adjustment, an error occurs. CTE definitions cannot be nested directly inside another CTE definition (though you can chain multiple CTEs sequentially).
- Azure Synapse Analytics: Support varies by the specific pool type. Recursive CTEs are currently not supported in Dedicated SQL Pools (formerly SQL DW). However, they are supported in Serverless SQL Pools. Additionally, some DML operations (like
UPDATEorDELETEwith CTEs) may have syntax restrictions compared to standard SQL Server.
If you find yourself working with SQL Server, know that DataCamp has a lot of great resources to help. To start, I recommend taking DataCamp’s Introduction to SQL Server course to master the basics of SQL Server for data analysis. You can try our SQL Server Developer career track, which covers everything from transactions and error handling to time series analysis. Our Hierarchical and Recursive Queries in SQL Server course gets right to the heart of how to write advanced queries in SQL Server, including methods involving CTEs.
Other potential issues
Although CTEs are useful for simplifying complex queries, there are some common pitfalls you should be aware of. They include the following:
-
Infinite Loops in Recursive CTEs: If the termination condition for a recursive CTE is not met, it can result in an infinite loop, causing the query to run indefinitely. To avoid the recursive CTE running infinitely, use the
OPTION (MAXRECURSION N)hint to limit the maximum number of recursive iterations, whereNis a specified limit.- How to fix: In SQL Server, use the
OPTION (MAXRECURSION N)hint to limit the maximum number of recursive iterations. In PostgreSQL, you can use theCYCLEclause to detect loops automatically.
- How to fix: In SQL Server, use the
-
Performance Considerations: Recursive CTEs can become resource-intensive if the recursion depth is high or large datasets are being processed. To optimize the performance, limit the data processed in each iteration and ensure appropriate filtering to avoid excessive recursion levels.
Performance: CTEs vs Subqueries
A common myth is that CTEs are inherently faster than subqueries. In reality, most modern query optimizers (like those in SQL Server and PostgreSQL) "inline" standard CTEs, meaning they are processed exactly like subqueries with no performance difference.
However, CTEs can provide a performance boost through materialization, where the database calculates the CTE result once and caches it for multiple references within the main query.
I recommend using CTEs primarily for readability. Performance gains are situational and depend on how your specific database handles caching.
When to Use CTEs vs. Other Techniques
While CTEs are appropriate for simplifying queries involving repeated tasks, derived tables, views, and temp tables also serve similar purposes. The following table highlights the advantages and disadvantages of each method and when to use each.
| Technique | Advantages | Disadvantages | Suitable Use Case |
|---|---|---|---|
| CTEs | Temporary scope within a single queryNo storage or maintenance requiredImproves readability by modularizing code | Limited to the query in which they are defined | Organizing complex queries, temporary transformations, and breaking down multi-step operations |
| Derived Tables | Simplifies nested subqueriesNo need for permanent storage | Harder to read/maintain for complex queriesIt cannot be reused multiple times within a query | Quick, single-use transformations and aggregations within a query |
| Views | Reusable across queriesCan enhance security by restricting data access | Requires maintenance and can affect multiple queriesComplex views can impact performance | Long-term reusable logic and data access control |
Conclusion
Mastering CTEs takes practice, like anything: I recommend trying out DataCamp’s Associate Data Analyst in SQL career track to become a proficient data analyst. The Reporting in SQL course will also help you become proficient in building complex reports and dashboards for effective data presentation. Finally, you should obtain the SQL Associate Certification to showcase your mastery in using SQL to solve business problems and stand out among other professionals.
Become SQL Certified
SQL CTE FAQs
What is a CTE in SQL?
A CTE (common table expression) is a temporary, named result set defined within an SQL query using the WITH keyword, which is used to simplify complex queries by breaking them into smaller, more manageable parts.
How does a CTE differ from a view?
CTEs are temporary and exist only for the duration of a single query. Views are stored in the database and can be reused across multiple queries. CTEs don’t consume storage space, while views do.
Are CTEs faster than temporary tables?
Not necessarily. CTEs improve readability but may not always perform better than temp tables for large datasets.
Can CTEs be used in INSERT, UPDATE, or DELETE operations?
Yes, CTEs can be used in data modification statements to simplify the process, especially when filtering or joining data is involved.
What is the difference between non-recursive and recursive CTEs?
Non-recursive CTEs do not reference themselves and act similarly to a subquery or temporary table. Non-recursive CTEs simplify complex queries similar to subqueries or temporary tables. Recursive CTEs, on the other hand, reference themselves within the query definition and are used for iterative data processing, such as traversing hierarchical data structures. They are appropriate for tasks that require repeated execution, with each step building on the previous one.
