Skip to main content

Cartesian Product in SQL: A Comprehensive Guide

Discover the fundamentals of Cartesian products in SQL. Learn when and how to use them effectively, and explore best practices to optimize performance.
Jan 13, 2025  · 9 min read

The Cartesian product is a foundational concept in SQL that can be both powerful and complex. Understanding how to use it effectively can greatly enhance your data manipulation capabilities. In this guide, we'll explore the ins and outs of Cartesian products, providing practical insights and examples. 

For those new to SQL, consider starting with our Intermediate SQL course to build a solid foundation. Also, the SQL Basics Cheat Sheet, which you can download, is a helpful reference because it has all the most common SQL functions.

What is a Cartesian Product?

The Cartesian product in SQL is the result of combining every row from one table with every row from another. This operation is most often generated through a CROSS JOIN, forming a result set containing all possible pairs of rows from the two tables. While powerful, it can lead to very large result sets, especially when working with tables containing many rows, so it should be used with caution. In fact, I'll bet many developers probably first discover Cartesian products the hard way by accidentally omitting a join condition of some kind, only to end up with a massive result set.

Mathematical background

In mathematics, a Cartesian product (denoted as A × B) is a set of all possible ordered pairs created by pairing each element of set A with each element of set B. In SQL, this translates to pairing each row of one table with every row in another.

Cartesian product example.

Cartesian product example. Image by Author.

SQL cross join

The cross join operation in SQL creates a Cartesian product between two tables by joining every row from the first table with every row from the second. As a result, the cross-join generates all possible row combinations for the data. Let us look at an example of performing a cross join operation in SQL.

Assume you have two tables, Students and Courses:

Students table

name grade
Alice 10
Bob 11
Charlie 12

Courses table

course level
Math Basic
Science Advanced
History Intermediate

In the following query, each student is paired with every course, showing their grade and difficulty levels.

-- Perform the cross join to list each student with every course
SELECT Students.name, Students.grade, Courses.course, Courses.level
FROM Students
CROSS JOIN Courses;

Example of cartesian product in SQL resulting from cross join

Example of Cartesian product in SQL resulting from cross join. Image from Author.

When to Use and Not Use the Cartesian Product in SQL

The Cartesian product in SQL can be very useful when the behavior is understood and used intentionally. Let us look at the use cases (and potential pitfalls) of using the Cartesian product.

Use cases

The following are the scenarios where using the Cartesian product may be useful:

  • Generating Combinations: A Cartesian product is ideal for generating all combinations of two sets of values. For example, you may want to pair products with discounts, match employees to shifts, or create all possible pairings between different categories.
  • Testing Scenarios: When running test cases, a Cartesian product can be useful to simulate every possible scenario by matching various parameter sets or configurations.
  • Filling in Missing Data: A Cartesian product can also help identify gaps by generating all possible combinations and then left-joining the results to find missing entries, such as missing dates in a time series analysis.

Potential pitfalls

While the Cartesian product is useful for some scenarios, there are some risks which include the following:

  • Performance Issues: Since a Cartesian product multiplies the rows of one table with those of another, the result can grow exponentially. If the dataset is too large, this can lead to slow query execution times, excessive memory usage, and even database crashes.
  • Unintended Data Duplication: Without careful filtering, Cartesian products can produce rows that appear duplicated or irrelevant for the analysis leading to skewed data interpretations or incorrect results if the duplicates aren’t expected or managed correctly.
  • Difficult Debugging: Unintended Cartesian products, such as forgetting to specify a join condition, can lead to a cross join instead of an inner or outer join. This can result in unexpected outputs that can be challenging to trace back and correct.

Real Applications of the Cartesian Product

Although the Cartesian product may be used sparingly due to the potential risks, it is useful in scenarios where you need to combine every possible pair of data elements.

Data combination scenarios

The Cartesian product is useful for calculating relative performance metrics, analyzing compatibility, or creating ranking systems. When you need to compare all elements in two datasets, you can use the Cartesian product to pair each item with every other item. 

For example, to compare each employee’s performance with every other employee, the Cartesian product would create all possible employee pairs:

-- Select two columns with employee names from the Employees table
SELECT 
    A.employee_name AS Employee1,
    B.employee_name AS Employee2
FROM Employees AS A
-- Cross join to create all possible pairs of employees
CROSS JOIN Employees AS B;

Cartesian product is also important in e-commerce analytics to explore the full range of product-discount pairings. Suppose you have tables for Products and Discounts, a cross join could create a list of all product-discount combinations:

-- Select product names and discount types
SELECT 
    Products.product_name,
    Discounts.discount_type
FROM Products
-- Generate all product and discount combinations
CROSS JOIN Discounts;

Testing and validation

A Cartesian product enables the creation of all potential parameter combinations in testing environments, which is essential for comprehensive scenario testing. For example, if testing a system with various OS, device, and browser configurations, using a cross-join ensures every possible configuration is tested.

-- Select combinations of operating systems, devices, and browsers
SELECT 
    OS.name AS OperatingSystem,
    Device.name AS Device,
    Browser.name AS Browser
FROM OperatingSystems AS OS
-- All combinations with devices
CROSS JOIN Devices AS Device
-- All combinations with browsers           
CROSS JOIN Browsers AS Browser;

A Cartesian product can help identify missing records, making it useful for ensuring data completeness. For example, when analyzing daily sales data and confirming that data exists for each product every day, you can create a Cartesian product of the Products and Dates tables. Then, you can left-join actual sales data to identify any missing date-product entries.

-- Select dates and products with no sales
SELECT 
    Dates.date,
    Products.product_name,
    Sales.sales_amount
FROM Dates
-- Create all date and product combinations
CROSS JOIN Products                          
-- Match sales by date and product
LEFT JOIN 
    Sales ON Dates.date = Sales.date 
           AND Products.product_name = Sales.product_name  
-- Filter for combinations with no sales
WHERE Sales.sales_amount IS NULL;

Some Ideas on Performance

Since Cartesian products generate possible data combinations between two tables, there may be significant performance issues when large tables are involved. I suggest the following techniques for query optimization and managing large datasets to improve database performance.

Query optimization

Consider the following methods for writing efficient queries that involve Cartesian products.

  • Limit Row Counts Before Joining: To reduce the number of rows involved in the Cartesian product, apply filters to each table before performing the cross join. For example, use WHERE clauses to filter each table by relevant criteria, minimizing the resulting dataset.

  • Use SELECT with Specific Columns: When performing a Cartesian product, specify only the columns you need in the SELECT clause to reduce memory usage and improve performance. Avoid using SELECT * as it returns all columns, potentially leading to large, unwieldy results.

  • Apply Conditions to Convert Cross Joins to Inner Joins: If only certain row combinations are relevant, consider adding additional cross join conditions that function like the INNER JOIN. This helps restrict the output by narrowing down which rows from each table should be paired.

  • Batch Processing: For large datasets, break down the queries into manageable chunks for batch processing. This can prevent memory overflow and improve response time, especially when querying over large tables.

Managing large datasets

Use the following techniques to manage the large datasets that result from using Cartesian products.

  • Indexing: Add indexes on frequently joined columns to improve performance when filtering or pairing large tables. While indexing does not directly speed up cross joins, it is useful when you later filter or join the Cartesian product with other tables, allowing the database to access rows more quickly.

  • Partitioning Large Tables: Partitioning tables can improve the performance of queries involving Cartesian products. Partitioning divides tables into smaller, more manageable pieces based on specified columns, allowing the database to read only the relevant partitions during query execution.

  • Limit Result Rows: Use the LIMIT clause to restrict the number of rows returned in cases where you only need a sample from the Cartesian product for testing or verification purposes.

  • Use Temporary Tables for Intermediate Results: Store intermediate results in temporary tables if you need to reuse a Cartesian product multiple times to reduce re-computations and speed up subsequent operations.

Other SQL Operations to Help

In scenarios where generating a full Cartesian product may not be efficient, alternative methods exist. The methods below allow you to achieve your intended results without overwhelming the database.

Inner and outer joins

An INNER JOIN returns only rows with a match between the specified columns in both tables. This technique limits the result to relevant combinations and avoids the multiple rows that come with a Cartesian product. The inner join is appropriate when you need only the rows that meet specific criteria in both tables. 

In the example below, the INNER JOIN returns only products and discounts that share the same category.

-- Select products and corresponding discounts by category
SELECT 
    A.product_name,
    B.discount_type
FROM Products AS A
-- Match products and discounts by category
INNER JOIN Discounts AS B ON A.category_id = B.category_id;

The outer join (LEFT JOIN, RIGHT JOIN, and FULL JOIN) return matched and unmatched rows from one or both tables. This technique is helpful when retrieving all rows from one table and any matching rows from the other, but without creating all possible combinations. The outer join is appropriate when you want to include unmatched data for analysis but not a full cartesian pairing.

The LEFT JOIN in the query below returns all products, including those without a corresponding discount, with NULL values for unmatched discounts.

-- Select products with corresponding discounts, (NULL if no matching discount)
SELECT 
    A.product_name,
    B.discount_type
FROM Products AS A
-- Match products with discounts by category
LEFT JOIN Discounts AS B ON A.category_id = B.category_id;  

I recommend taking DataCamp’s Introduction to SQL Server course to learn more about how to join tables and retrieve data from multiple tables.

Subqueries and CTEs

Subqueries allow you to select specific data from one table and use it to filter results in the main query. This helps avoid unnecessary row combinations by filtering data to only the relevant entries. Subqueries are useful when you need to pre-filter data or compute specific values for use in the main query without needing a full cartesian join.

The subquery filters discounts with values over 10 before joining, reducing the data processed and avoiding an unnecessary cross join.

-- Select products with discounts greater than 10 by category
SELECT 
    A.product_name,
    B.discount
FROM Products AS A
-- Subquery to filter discounts over 10
INNER JOIN 
    (SELECT discount, category_id   
     FROM Discounts 
     WHERE discount > 10) AS B
-- Match products with filtered discounts by category
ON A.category_id = B.category_id;   

CTEs, or WITH clauses, enable complex queries to be broken down into steps. This method helps avoid Cartesian products by filtering or aggregating data before joining. CTEs are appropriate when simplifying multi-step queries, especially when pre-aggregating data or performing multiple joins, but want to avoid a full Cartesian result.

The query below uses a Common Table Expression (CTE) to filter discounts greater than 10 before joining with products based on category.

-- Filter discounts and join with products by category
WITH FilteredDiscounts AS (
    SELECT discount, category_id
    FROM Discounts
    WHERE discount > 10
)
SELECT 
    A.product_name,
    FD.discount
FROM Products AS A
-- Join with filtered discounts by category
INNER JOIN 
    FilteredDiscounts AS FD ON A.category_id = FD.category_id;

Innovative Applications of Cartesian Product

While the Cartesian product is widely used in basic SQL operations, it has applications beyond traditional database use cases. Let us look at how the Cartesian product can be used in machine learning data preparation and generating synthetic datasets for simulation.

Machine learning data preparation

Diverse and comprehensive datasets are important when preparing data for machine learning to train robust models. Cartesian products can create a complete dataset that covers all possible combinations of feature values, enabling a model to learn from a broader array of scenarios. 

For example, when building fraud detection models, you could combine transaction types, regions, and device types to simulate various conditions, helping the model to generalize across different cases.

Synthetic data generation

Synthetic data is often used to test systems under a wide range of conditions or to protect privacy by avoiding using real data. Cartesian products can help generate diverse synthetic datasets, which are useful in applications ranging from simulations to data augmentation for machine learning models.

For example, you could combine different market conditions, interest rates, and customer profiles in financial simulations to test how a model or algorithm performs across these variables.

Below is a simple example of how you might create synthetic data “out of nothing”, so to speak, in SQL and then use a CROSS JOIN to expand it into a larger dataset. The idea is that each small table represents a dimension of our test data and then, when we cross-join them, we get all possible combinations.

Example 1: Combining several dimensions

Suppose you want to simulate different financial scenarios based on Market conditions, Interest rates, and Customer profiles. We can define each dimension via a Common Table Expression (CTE) or temporary table using literal values. Then, a CROSS JOIN merges them into every possible combination:

WITH MarketConditions AS (
    SELECT 'Bull' AS condition
    UNION ALL
    SELECT 'Bear'
    UNION ALL
    SELECT 'Sideways'
),
InterestRates AS (
    SELECT 1.5 AS rate
    UNION ALL
    SELECT 2.0
    UNION ALL
    SELECT 3.25
),
CustomerProfiles AS (
    SELECT 'Younger' AS profile
    UNION ALL
    SELECT 'Mid-career'
    UNION ALL
    SELECT 'Retired'
)
SELECT 
    mc.condition,
    ir.rate,
    cp.profile
FROM 
    MarketConditions mc
CROSS JOIN 
    InterestRates ir
CROSS JOIN 
    CustomerProfiles cp;

The result set will have 3×3×3 = 27 rows. Each row is one unique combination of market condition, interest rate, and customer profile.

Example of synthetic data generation using Cartesian product.

Example of synthetic data generation using Cartesian product. Image by Author.

Example 2: Generating rows using sequences or simple VALUES

Another technique is to generate a range of numbers or time intervals by cross-joining small tables with themselves. This can be used to produce large volumes of synthetic rows for quick testing. For example, we can use the following query to generate the numbers 1 to 10.

WITH
Nums AS (
    SELECT 1 AS n
    UNION ALL SELECT 2
    UNION ALL SELECT 3
    UNION ALL SELECT 4
    UNION ALL SELECT 5
    UNION ALL SELECT 6
    UNION ALL SELECT 7
    UNION ALL SELECT 8
    UNION ALL SELECT 9
    UNION ALL SELECT 10
)
SELECT n
FROM Nums;

Using the Cartesian product to generate rows using sequences.

Using the Cartesian product to generate rows using sequences. Image by Author.

We can then cross join the same 10-row set with itself to make 10×10=100:

WITH
Nums AS (
    SELECT 1 AS n
    UNION ALL SELECT 2
    UNION ALL SELECT 3
    UNION ALL SELECT 4
    UNION ALL SELECT 5
    UNION ALL SELECT 6
    UNION ALL SELECT 7
    UNION ALL SELECT 8
    UNION ALL SELECT 9
    UNION ALL SELECT 10
)
SELECT A.n AS n1, B.n AS n2
FROM Nums A
CROSS JOIN Nums B
ORDER BY A.n, B.n;

For the results, you will get 100 rows, each row having a pair of values (n1,n2). You can extend this logic to hundreds, thousands, or even millions of rows—though be aware of performance and storage.

By using these approaches—CTEs (or VALUES clauses) for dimension definitions and CROSS JOIN for combination— you can quickly create synthetic datasets in SQL without relying on existing tables.

Conclusion

Understanding the Cartesian product in SQL opens up a range of possibilities for data manipulation and analysis. By mastering this concept, you can enhance your ability to work with complex datasets and optimize your database queries. For more advanced insights, consider our SQL Server Fundamentals skill track to deepen your expertise. 

Depending on your career direction, I recommend enrolling in either DataCamp’s Associate Data Analyst in SQL career track to learn the industry-level skills required to become a data analyst or trying our Associate Date Engineer in SQL career track to become a data engineer. 

Associate Data Engineer in SQL

Gain practical knowledge in ETL, SQL, and data warehousing for data engineering.
Explore Track

Allan Ouko's photo
Author
Allan Ouko
LinkedIn
I create articles that simplify data science and analytics, making them easy to understand and accessible.

Cartesian Product FAQs

What is a cartesian product in SQL?

A cartesian product in SQL results from a cross join operation, where every row from one table is combined with every row from another.

How does cartesian product differ from other SQL joins?

Unlike other joins, a cartesian product combines all possible pairs of rows from the involved tables, often resulting in a large dataset.

When should I use a cartesian product?

Use a cartesian product when you need to generate all combinations of rows from two tables, such as in testing or specific analytical scenarios.

What are the performance considerations for cartesian products?

Cartesian products can lead to large result sets, which can impact performance. To mitigate this, optimize queries and manage datasets carefully.

Are there alternatives to using cartesian products in SQL?

Yes, depending on the specific use case, alternatives include using inner or outer joins, subqueries, and common table expressions.

Topics

Learn SQL with DataCamp

course

Introduction to SQL

2 hr
970.7K
Learn how to create and query relational databases using SQL in just two hours.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

tutorial

SQL Pivot Rows to Columns: A Comprehensive Guide

Learn the art of pivoting rows into columns in SQL in Oracle, MySQL, and SQL Server. Explore practical methods with detailed examples to boost your data engineering skills.
Allan Ouko's photo

Allan Ouko

9 min

tutorial

How to Use SQL PIVOT

Enhance your SQL skills with the SQL PIVOT operator. Learn to convert rows into columns to create pivot tables in SQL Server and Oracle.
Allan Ouko's photo

Allan Ouko

10 min

tutorial

CTE in SQL: A Complete Guide with Examples

Understand how to use common table expressions to simplify complex queries for improved readability. Learn the difference between non-recursive and recursive CTEs.
Allan Ouko's photo

Allan Ouko

10 min

tutorial

Databricks SQL: A Comprehensive Guide for Data Analytics and BI Workloads

Discover how to leverage Databricks SQL for efficient data analytics, querying, and business intelligence with practical examples and best practices.
Allan Ouko's photo

Allan Ouko

11 min

tutorial

Set Operators in SQL: A Comprehensive Guide

Set operations in SQL are techniques for combining or comparing the results of two or more SELECT statements.
Kurtis Pykes 's photo

Kurtis Pykes

9 min

code-along

Getting Started in SQL

Learn how to write basic queries in SQL and find answers to business questions.
Kelsey McNeillie's photo

Kelsey McNeillie

See MoreSee More