Course
In the realm of data analysis and management, SQL (Structured Query Language) stands as a critical tool for accessing and manipulating database systems. Among its many functions, the NOT IN operator serves as a powerful means to filter data sets, allowing users to exclude rows that match a specified list of values. This tutorial is designed with the aim of demystifying the NOT IN operator for beginners and data practitioners, ensuring clarity in its application and understanding its potential pitfalls.
Associate Data Engineer in SQL
Why NOT IN?
Choosing to focus on the NOT IN operator stems from its utility in data exclusion scenarios. Whether you're cleaning data, preparing reports, or conducting complex analyses, understanding how to effectively exclude specific data points is invaluable. This operator offers a straightforward syntax but comes with nuances that can affect performance and accuracy.
Our decision to explore NOT IN in depth is to provide a comprehensive understanding that equips you with the knowledge to use it effectively in your SQL queries.
Understanding SQL NOT IN
The Basics of NOT IN
The NOT IN operator is used within a WHERE clause to exclude rows where a specified column's value matches any in a given list of values. Its basic syntax is as follows:
SELECT column_names
FROM table_name
WHERE column_name NOT IN (value1, value2, ...);
This syntax highlights the operator's role in filtering data—a critical step in data analysis and database management.
Practical Applications of NOT IN
Consider a database containing a table of customer information. If you wish to select customers who are not from specific cities, NOT IN becomes an invaluable tool, allowing you to exclude those cities from your result set easily.
Step-by-Step NOT IN Examples
Example 1: Excluding Specific Values
Let's say we have a table named Customers with columns CustomerID, CustomerName, and City. To find customers who are not located in 'New York' or 'Los Angeles', the query would be:
|
CustomerlD |
CustomerName |
City |
|
1 |
John Doe |
New York |
|
2 |
Jane Smith |
Los Angeles |
|
3 |
Emily Jones |
Chicago |
|
4 |
Chris Brown |
Miami |
|
5 |
Alex Johnson |
San Francisco |
|
6 |
Jessica White |
New York |
Customers Table
SELECT CustomerName, City
FROM Customers
WHERE City NOT IN ('New York', 'Los Angeles');
|
CustomerName |
City |
|
Emily Jones |
Chicago |
|
Chris Brown |
Miami |
|
Alex Johnson |
San Francisco |
Example 2: Using NOT IN with Subqueries
NOT IN can also work with subqueries. For example, to find products that have not been ordered:
|
ProductID |
ProductName |
|
1 |
Apple |
|
2 |
Banana |
|
3 |
Orange |
|
4 |
Pear |
|
5 |
Grape |
Table Products
|
OrderID |
ProductID |
|
101 |
2 |
|
102 |
4 |
|
103 |
2 |
|
104 |
3 |
Order Table
SELECT ProductName
FROM Products
WHERE ProductID NOT IN (
SELECT ProductID
FROM Orders
);
|
ProductName |
|
Apple |
|
Grape |
Common Pitfalls and How to Avoid Them
Handling NULL Values
A common issue with NOT IN arises when the list of values contains NULL. Because NULL represents an unknown value, any comparison with NULL using NOT IN will not return any rows, even if there are rows that should logically be excluded from the list.
To circumvent this, ensure that the list of values does not contain NULL or use an alternative approach such as NOT EXISTS.
Example:
Suppose we have a table Orders with a column CustomerID, some of which are NULL to represent guest orders. If you run a query to find orders not made by certain customers, including NULL in the list would cause no results to be returned.
|
OrderID |
CustomerID |
|
101 |
3 |
|
102 |
4 |
|
103 |
2 |
|
104 |
|
|
106 |
1 |
Orders Table
-- Assume we want to exclude CustomerID 1, 2, and unknown (NULL) customers
SELECT OrderID
FROM Orders
WHERE CustomerID NOT IN (1, 2, NULL);
This query would return no rows, which is likely not the intended outcome.
Solution: To ensure accurate results, avoid including NULL in the list for NOT IN. Alternatively, use a combination of NOT IN for known values and IS NOT NULL for handling NULLs effectively.
SELECT OrderID
FROM Orders
WHERE CustomerID NOT IN (1, 2) AND CustomerID IS NOT NULL;
|
OrderID |
|
101 |
|
102 |
Performance Considerations
For large datasets or subqueries, NOT IN can be less efficient than alternatives like NOT EXISTS or LEFT JOIN /IS NULL. The inefficiency stems from how NOT IN compares each row in the table to each value in the list, which can lead to slow performance on large datasets.
Alternatives to SQL NOT IN
Using NOT EXISTS
NOT EXISTS is often recommended over NOT IN when dealing with subqueries that might return NULL values. It is generally more efficient because it stops as soon as it finds a match.
Example:
To find products that have not been ordered, using NOT EXISTS:
SELECT ProductName
FROM Products p
WHERE NOT EXISTS (
SELECT 1
FROM Orders o
WHERE o.ProductID = p.ProductID
);
|
ProductName |
|
Apple |
|
Grape |
This query checks for the non-existence of a product in the orders table, which can be more efficient than comparing against a potentially large list of IDs with NOT IN.
LEFT JOIN / IS NULL
Another efficient alternative for excluding rows is to use a LEFT JOIN combined with a WHERE ... IS NULL clause. This method is particularly effective for large datasets.
Example:
To achieve the same goal of finding products that have not been ordered:
SELECT p.ProductName
FROM Products p
LEFT JOIN Orders o ON p.ProductID = o.ProductID
WHERE o.ProductID IS NULL;
|
ProductName |
|
Apple |
|
Grape |
This method leverages the LEFT JOIN to include all products and any matching orders, then filters out those products that have orders using WHERE o.ProductID IS NULL, effectively excluding them.
Best Practices for Using NOT IN
- Verify the list for
NULLvalues: Ensure the list used inNOT INdoes not contain NULL values to avoid unexpected results. - Consider dataset size: For large datasets or subqueries, evaluate whether alternatives like
NOT EXISTSmight offer better performance. - Indexing: Ensure that the columns used in the
NOT INclause are indexed, if possible, to improve query performance.
Using NOT IN in Real Life Data Project
In data analysis, the "Not IN" SQL command is commonly used to retrieve specific data. In this section, we will work with a bookstore database that keeps track of its inventory (books available in the store) and sales (books sold). Our goal is to identify which books have never been sold.
We have two tables:
- Inventory: to store information about books in the bookstore.
- Sales: to store records of books that have been sold.

Inventory Table

Sales Table
Now, we want to find out which books have never been sold.
SELECT Title, Author
FROM Inventory
WHERE BookID NOT IN (
SELECT BookID
FROM Sales
);

Next, we want to identify books in our inventory that have not been sold in the last month, assuming today is February 7, 2024. This involves checking the SaleDate in the Sales table against our current date.
SELECT Title, Author
FROM Inventory
WHERE BookID NOT IN (
SELECT BookID
FROM Sales
WHERE SaleDate >= date('now', '-1 month')
);

Conclusion
The NOT IN operator is a versatile tool in SQL for excluding specific values from your query results. By understanding its syntax, practical applications, and potential pitfalls, you can effectively use this operator in your data manipulation tasks.
Remember to consider NULL values and dataset size to optimize your queries' performance and accuracy. As you become more familiar with NOT IN, you'll find it an invaluable addition to your SQL toolkit, enabling more precise and efficient data analysis and management.
You can learn more about NOT IN and NOT EXISTS in our Improving Query Performance in SQL Server course and explore SQL operators and much more in our SQL Basics Cheat Sheet and SQL Fundamentals skill track.
Become a Data Engineer

As a certified data scientist, I am passionate about leveraging cutting-edge technology to create innovative machine learning applications. With a strong background in speech recognition, data analysis and reporting, MLOps, conversational AI, and NLP, I have honed my skills in developing intelligent systems that can make a real impact. In addition to my technical expertise, I am also a skilled communicator with a talent for distilling complex concepts into clear and concise language. As a result, I have become a sought-after blogger on data science, sharing my insights and experiences with a growing community of fellow data professionals. Currently, I am focusing on content creation and editing, working with large language models to develop powerful and engaging content that can help businesses and individuals alike make the most of their data.