course
Cartesian Product in SQL: A Comprehensive Guide
The Cartesian product is a foundational concept in SQL that can be both powerful and complex. Understanding how to use it effectively can greatly enhance your data manipulation capabilities. In this guide, we'll explore the ins and outs of Cartesian products, providing practical insights and examples.
For those new to SQL, consider starting with our Intermediate SQL course to build a solid foundation. Also, the SQL Basics Cheat Sheet, which you can download, is a helpful reference because it has all the most common SQL functions.
What is a Cartesian Product?
The Cartesian product in SQL is the result of combining every row from one table with every row from another. This operation is most often generated through a CROSS JOIN
, forming a result set containing all possible pairs of rows from the two tables. While powerful, it can lead to very large result sets, especially when working with tables containing many rows, so it should be used with caution. In fact, I'll bet many developers probably first discover Cartesian products the hard way by accidentally omitting a join condition of some kind, only to end up with a massive result set.
Mathematical background
In mathematics, a Cartesian product (denoted as A × B) is a set of all possible ordered pairs created by pairing each element of set A with each element of set B. In SQL, this translates to pairing each row of one table with every row in another.
Cartesian product example. Image by Author.
SQL cross join
The cross join operation in SQL creates a Cartesian product between two tables by joining every row from the first table with every row from the second. As a result, the cross-join generates all possible row combinations for the data. Let us look at an example of performing a cross join operation in SQL.
Assume you have two tables, Students
and Courses
:
Students table
name | grade |
---|---|
Alice | 10 |
Bob | 11 |
Charlie | 12 |
Courses table
course | level |
---|---|
Math | Basic |
Science | Advanced |
History | Intermediate |
In the following query, each student is paired with every course, showing their grade and difficulty levels.
-- Perform the cross join to list each student with every course
SELECT Students.name, Students.grade, Courses.course, Courses.level
FROM Students
CROSS JOIN Courses;
Example of Cartesian product in SQL resulting from cross join. Image from Author.
When to Use and Not Use the Cartesian Product in SQL
The Cartesian product in SQL can be very useful when the behavior is understood and used intentionally. Let us look at the use cases (and potential pitfalls) of using the Cartesian product.
Use cases
The following are the scenarios where using the Cartesian product may be useful:
- Generating Combinations: A Cartesian product is ideal for generating all combinations of two sets of values. For example, you may want to pair products with discounts, match employees to shifts, or create all possible pairings between different categories.
- Testing Scenarios: When running test cases, a Cartesian product can be useful to simulate every possible scenario by matching various parameter sets or configurations.
- Filling in Missing Data: A Cartesian product can also help identify gaps by generating all possible combinations and then left-joining the results to find missing entries, such as missing dates in a time series analysis.
Potential pitfalls
While the Cartesian product is useful for some scenarios, there are some risks which include the following:
- Performance Issues: Since a Cartesian product multiplies the rows of one table with those of another, the result can grow exponentially. If the dataset is too large, this can lead to slow query execution times, excessive memory usage, and even database crashes.
- Unintended Data Duplication: Without careful filtering, Cartesian products can produce rows that appear duplicated or irrelevant for the analysis leading to skewed data interpretations or incorrect results if the duplicates aren’t expected or managed correctly.
- Difficult Debugging: Unintended Cartesian products, such as forgetting to specify a join condition, can lead to a cross join instead of an inner or outer join. This can result in unexpected outputs that can be challenging to trace back and correct.
Real Applications of the Cartesian Product
Although the Cartesian product may be used sparingly due to the potential risks, it is useful in scenarios where you need to combine every possible pair of data elements.
Data combination scenarios
The Cartesian product is useful for calculating relative performance metrics, analyzing compatibility, or creating ranking systems. When you need to compare all elements in two datasets, you can use the Cartesian product to pair each item with every other item.
For example, to compare each employee’s performance with every other employee, the Cartesian product would create all possible employee pairs:
-- Select two columns with employee names from the Employees table
SELECT
A.employee_name AS Employee1,
B.employee_name AS Employee2
FROM Employees AS A
-- Cross join to create all possible pairs of employees
CROSS JOIN Employees AS B;
Cartesian product is also important in e-commerce analytics to explore the full range of product-discount pairings. Suppose you have tables for Products
and Discounts
, a cross join could create a list of all product-discount combinations:
-- Select product names and discount types
SELECT
Products.product_name,
Discounts.discount_type
FROM Products
-- Generate all product and discount combinations
CROSS JOIN Discounts;
Testing and validation
A Cartesian product enables the creation of all potential parameter combinations in testing environments, which is essential for comprehensive scenario testing. For example, if testing a system with various OS, device, and browser configurations, using a cross-join ensures every possible configuration is tested.
-- Select combinations of operating systems, devices, and browsers
SELECT
OS.name AS OperatingSystem,
Device.name AS Device,
Browser.name AS Browser
FROM OperatingSystems AS OS
-- All combinations with devices
CROSS JOIN Devices AS Device
-- All combinations with browsers
CROSS JOIN Browsers AS Browser;
A Cartesian product can help identify missing records, making it useful for ensuring data completeness. For example, when analyzing daily sales data and confirming that data exists for each product every day, you can create a Cartesian product of the Products
and Dates
tables. Then, you can left-join actual sales data to identify any missing date-product entries.
-- Select dates and products with no sales
SELECT
Dates.date,
Products.product_name,
Sales.sales_amount
FROM Dates
-- Create all date and product combinations
CROSS JOIN Products
-- Match sales by date and product
LEFT JOIN
Sales ON Dates.date = Sales.date
AND Products.product_name = Sales.product_name
-- Filter for combinations with no sales
WHERE Sales.sales_amount IS NULL;
Some Ideas on Performance
Since Cartesian products generate possible data combinations between two tables, there may be significant performance issues when large tables are involved. I suggest the following techniques for query optimization and managing large datasets to improve database performance.
Query optimization
Consider the following methods for writing efficient queries that involve Cartesian products.
-
Limit Row Counts Before Joining: To reduce the number of rows involved in the Cartesian product, apply filters to each table before performing the cross join. For example, use
WHERE
clauses to filter each table by relevant criteria, minimizing the resulting dataset. -
Use SELECT with Specific Columns: When performing a Cartesian product, specify only the columns you need in the
SELECT
clause to reduce memory usage and improve performance. Avoid usingSELECT *
as it returns all columns, potentially leading to large, unwieldy results. -
Apply Conditions to Convert Cross Joins to Inner Joins: If only certain row combinations are relevant, consider adding additional cross join conditions that function like the
INNER JOIN
. This helps restrict the output by narrowing down which rows from each table should be paired. -
Batch Processing: For large datasets, break down the queries into manageable chunks for batch processing. This can prevent memory overflow and improve response time, especially when querying over large tables.
Managing large datasets
Use the following techniques to manage the large datasets that result from using Cartesian products.
-
Indexing: Add indexes on frequently joined columns to improve performance when filtering or pairing large tables. While indexing does not directly speed up cross joins, it is useful when you later filter or join the Cartesian product with other tables, allowing the database to access rows more quickly.
-
Partitioning Large Tables: Partitioning tables can improve the performance of queries involving Cartesian products. Partitioning divides tables into smaller, more manageable pieces based on specified columns, allowing the database to read only the relevant partitions during query execution.
-
Limit Result Rows: Use the
LIMIT
clause to restrict the number of rows returned in cases where you only need a sample from the Cartesian product for testing or verification purposes. -
Use Temporary Tables for Intermediate Results: Store intermediate results in temporary tables if you need to reuse a Cartesian product multiple times to reduce re-computations and speed up subsequent operations.
Other SQL Operations to Help
In scenarios where generating a full Cartesian product may not be efficient, alternative methods exist. The methods below allow you to achieve your intended results without overwhelming the database.
Inner and outer joins
An INNER JOIN
returns only rows with a match between the specified columns in both tables. This technique limits the result to relevant combinations and avoids the multiple rows that come with a Cartesian product. The inner join is appropriate when you need only the rows that meet specific criteria in both tables.
In the example below, the INNER JOIN
returns only products and discounts that share the same category.
-- Select products and corresponding discounts by category
SELECT
A.product_name,
B.discount_type
FROM Products AS A
-- Match products and discounts by category
INNER JOIN Discounts AS B ON A.category_id = B.category_id;
The outer join (LEFT JOIN
, RIGHT JOIN
, and FULL JOIN
) return matched and unmatched rows from one or both tables. This technique is helpful when retrieving all rows from one table and any matching rows from the other, but without creating all possible combinations. The outer join is appropriate when you want to include unmatched data for analysis but not a full cartesian pairing.
The LEFT JOIN
in the query below returns all products, including those without a corresponding discount, with NULL
values for unmatched discounts.
-- Select products with corresponding discounts, (NULL if no matching discount)
SELECT
A.product_name,
B.discount_type
FROM Products AS A
-- Match products with discounts by category
LEFT JOIN Discounts AS B ON A.category_id = B.category_id;
I recommend taking DataCamp’s Introduction to SQL Server course to learn more about how to join tables and retrieve data from multiple tables.
Subqueries and CTEs
Subqueries allow you to select specific data from one table and use it to filter results in the main query. This helps avoid unnecessary row combinations by filtering data to only the relevant entries. Subqueries are useful when you need to pre-filter data or compute specific values for use in the main query without needing a full cartesian join.
The subquery filters discounts with values over 10 before joining, reducing the data processed and avoiding an unnecessary cross join.
-- Select products with discounts greater than 10 by category
SELECT
A.product_name,
B.discount
FROM Products AS A
-- Subquery to filter discounts over 10
INNER JOIN
(SELECT discount, category_id
FROM Discounts
WHERE discount > 10) AS B
-- Match products with filtered discounts by category
ON A.category_id = B.category_id;
CTEs, or WITH
clauses, enable complex queries to be broken down into steps. This method helps avoid Cartesian products by filtering or aggregating data before joining. CTEs are appropriate when simplifying multi-step queries, especially when pre-aggregating data or performing multiple joins, but want to avoid a full Cartesian result.
The query below uses a Common Table Expression (CTE) to filter discounts greater than 10 before joining with products based on category.
-- Filter discounts and join with products by category
WITH FilteredDiscounts AS (
SELECT discount, category_id
FROM Discounts
WHERE discount > 10
)
SELECT
A.product_name,
FD.discount
FROM Products AS A
-- Join with filtered discounts by category
INNER JOIN
FilteredDiscounts AS FD ON A.category_id = FD.category_id;
Innovative Applications of Cartesian Product
While the Cartesian product is widely used in basic SQL operations, it has applications beyond traditional database use cases. Let us look at how the Cartesian product can be used in machine learning data preparation and generating synthetic datasets for simulation.
Machine learning data preparation
Diverse and comprehensive datasets are important when preparing data for machine learning to train robust models. Cartesian products can create a complete dataset that covers all possible combinations of feature values, enabling a model to learn from a broader array of scenarios.
For example, when building fraud detection models, you could combine transaction types, regions, and device types to simulate various conditions, helping the model to generalize across different cases.
Synthetic data generation
Synthetic data is often used to test systems under a wide range of conditions or to protect privacy by avoiding using real data. Cartesian products can help generate diverse synthetic datasets, which are useful in applications ranging from simulations to data augmentation for machine learning models.
For example, you could combine different market conditions, interest rates, and customer profiles in financial simulations to test how a model or algorithm performs across these variables.
Below is a simple example of how you might create synthetic data “out of nothing”, so to speak, in SQL and then use a CROSS JOIN
to expand it into a larger dataset. The idea is that each small table represents a dimension of our test data and then, when we cross-join them, we get all possible combinations.
Example 1: Combining several dimensions
Suppose you want to simulate different financial scenarios based on Market conditions, Interest rates, and Customer profiles. We can define each dimension via a Common Table Expression (CTE) or temporary table using literal values. Then, a CROSS JOIN
merges them into every possible combination:
WITH MarketConditions AS (
SELECT 'Bull' AS condition
UNION ALL
SELECT 'Bear'
UNION ALL
SELECT 'Sideways'
),
InterestRates AS (
SELECT 1.5 AS rate
UNION ALL
SELECT 2.0
UNION ALL
SELECT 3.25
),
CustomerProfiles AS (
SELECT 'Younger' AS profile
UNION ALL
SELECT 'Mid-career'
UNION ALL
SELECT 'Retired'
)
SELECT
mc.condition,
ir.rate,
cp.profile
FROM
MarketConditions mc
CROSS JOIN
InterestRates ir
CROSS JOIN
CustomerProfiles cp;
The result set will have 3×3×3 = 27 rows. Each row is one unique combination of market condition, interest rate, and customer profile.
Example of synthetic data generation using Cartesian product. Image by Author.
Example 2: Generating rows using sequences or simple VALUES
Another technique is to generate a range of numbers or time intervals by cross-joining small tables with themselves. This can be used to produce large volumes of synthetic rows for quick testing. For example, we can use the following query to generate the numbers 1 to 10.
WITH
Nums AS (
SELECT 1 AS n
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
UNION ALL SELECT 6
UNION ALL SELECT 7
UNION ALL SELECT 8
UNION ALL SELECT 9
UNION ALL SELECT 10
)
SELECT n
FROM Nums;
Using the Cartesian product to generate rows using sequences. Image by Author.
We can then cross join the same 10-row set with itself to make 10×10=100:
WITH
Nums AS (
SELECT 1 AS n
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
UNION ALL SELECT 6
UNION ALL SELECT 7
UNION ALL SELECT 8
UNION ALL SELECT 9
UNION ALL SELECT 10
)
SELECT A.n AS n1, B.n AS n2
FROM Nums A
CROSS JOIN Nums B
ORDER BY A.n, B.n;
For the results, you will get 100 rows, each row having a pair of values (n1,n2). You can extend this logic to hundreds, thousands, or even millions of rows—though be aware of performance and storage.
By using these approaches—CTEs (or VALUES
clauses) for dimension definitions and CROSS JOIN
for combination— you can quickly create synthetic datasets in SQL without relying on existing tables.
Conclusion
Understanding the Cartesian product in SQL opens up a range of possibilities for data manipulation and analysis. By mastering this concept, you can enhance your ability to work with complex datasets and optimize your database queries. For more advanced insights, consider our SQL Server Fundamentals skill track to deepen your expertise.
Depending on your career direction, I recommend enrolling in either DataCamp’s Associate Data Analyst in SQL career track to learn the industry-level skills required to become a data analyst or trying our Associate Date Engineer in SQL career track to become a data engineer.
Associate Data Engineer in SQL
Cartesian Product FAQs
What is a cartesian product in SQL?
A cartesian product in SQL results from a cross join operation, where every row from one table is combined with every row from another.
How does cartesian product differ from other SQL joins?
Unlike other joins, a cartesian product combines all possible pairs of rows from the involved tables, often resulting in a large dataset.
When should I use a cartesian product?
Use a cartesian product when you need to generate all combinations of rows from two tables, such as in testing or specific analytical scenarios.
What are the performance considerations for cartesian products?
Cartesian products can lead to large result sets, which can impact performance. To mitigate this, optimize queries and manage datasets carefully.
Are there alternatives to using cartesian products in SQL?
Yes, depending on the specific use case, alternatives include using inner or outer joins, subqueries, and common table expressions.
Learn SQL with DataCamp
course
Data Manipulation in SQL
track
SQL for Business Analysts
tutorial
SQL Pivot Rows to Columns: A Comprehensive Guide

Allan Ouko
9 min
tutorial
How to Use SQL PIVOT

Allan Ouko
10 min
tutorial
CTE in SQL: A Complete Guide with Examples

Allan Ouko
10 min
tutorial
Databricks SQL: A Comprehensive Guide for Data Analytics and BI Workloads

Allan Ouko
11 min
tutorial
Set Operators in SQL: A Comprehensive Guide
code-along
Getting Started in SQL

Kelsey McNeillie