Skip to main content

CASE Statement in SQL Explained

Understand how to exploit SQL’s CASE statement to create categorical columns.
Jan 16, 2025  · 7 min read

If you’ve worked with databases, you’ve likely noticed that data is rarely perfect and often requires manipulation to extract meaningful insights.

To meet this need, SQL offers a powerful construct, called a CASE statement, which allows you to create new columns or transform data based on conditions applied to existing columns.

In this article, we’ll explain what a CASE statement is, why it’s essential and explore various use cases to help you harness its potential. Let’s get started!

What is a CASE Statement?

The Case Statement in SQL is a conditional expression that introduces a decision making logic into your query. It works similarly to an if-elif-else statement in Python, allowing you to evaluate multiple conditions and return specific results based on those conditions. 

The general syntax is as follows:

CASE
	WHEN boolean_condition1 THEN result1
	WHEN boolean_condition2 THEN result2
	ELSE result3
END

Here’s how it works:

  • The CASE keyword marks the beginning of the expression, while END marks its conclusion, enclosing the conditional statement like a frame.
  • Each WHEN clause evaluates a condition. If the condition is true, the corresponding THEN clause returns a specified result.
  • If none of these conditions in the WHEN clauses are met, the ELSE clause provides a default result.

Why is the CASE Statement Important?

The CASE WHEN statement is a valuable tool in SQL queries, offering several key benefits:

  • Data transformation: enables you to create new columns based on the values of existing columns.
  • Conditional aggregation: allows you to create aggregated values (e.g., sums, counts, max) for specific subsets of data based on conditions.
  • Data filtering: you can include or exclude rows based on specific criteria.
  • Query simplification and readability: It helps to consolidate logic into a single query, reducing the need for multiple queries, temporary tables, or complex joins.
  • Performance optimization: by reducing the complexity of the SQL queries, the CASE statement can also contribute to optimizing the performance and minimize the execution time, especially when it avoids the need for additional computations and transformations.

Simple CASE Statement

In this article, we are going to use the film table of the cinema database. Let’s suppose that we want to create a new categorical column based on the values of ROI (gross to budget ratio):

SELECT 
	title,
	gross,
	budget,
	gross / budget AS ROI,
CASE
	 WHEN gross / budget < 1 THEN 'low ROI'
	 WHEN gross / budget BETWEEN 1 AND 2 THEN 'medium ROI'
	 ELSE 'high ROI'
END as ROI_group
FROM cinema.films
WHERE budget > 0 AND gross > 0 AND release_year = 2015;

Given the gross and the budget, we have calculated the ROI, which is a popular profitability metric that measure the income generated by an investment.

In addition to the ROI field, we also obtained its categorization to communicate better:

  • If the films with ROI less than 1, the value is “low ROI”
  • The films with ROI between 1 and 2 belong to the category “medium ROI”
  • “High ROI” is the label assigned to films with an ROI greater than or equal to 2.

We have also filtered for films that have zero or negative budgets or gross values. Moreover, we restricted the results to films released in 2015.

Using CASE in Group By Clause

In the first example, we have created a new field, that can be applied to aggregate the data and obtain additional insights.

For instance, we can calculate the average gross and the average budget for each group of ROI:

SELECT 
CASE
	 WHEN gross / budget < 1 THEN 'low ROI'
	 WHEN gross / budget BETWEEN 1 AND 2 THEN 'medium ROI'
	 ELSE 'high ROI'
END AS ROI_group,
avg(gross) as avg_gross,
avg(budget) as avg_budget
FROM cinema.films
WHERE budget > 0 AND gross > 0 AND release_year = 2015
GROUP BY
CASE
	 WHEN gross / budget < 1 THEN 'low ROI'
	 WHEN gross / budget BETWEEN 1 AND 2 THEN 'medium ROI'
	 ELSE 'high ROI'
END
ORDER BY avg(gross) DESC
;

To group the data, we just put the case statement in the GROUP BY clause. To have additional insights, we ordered the rows on decreasing order based on the average gross. In this way, we are able to catch the highest average gross is assigned to the high ROI and vice versa for the low ROI.

Using CASE in WHERE Clause

Another benefit of CASE statement is that it enable to filter rows based on specific conditions. For example, we can apply different ROI thresholds based on tha language of the film. It’s known that English films have consistently high ROI. So, the threshold should have a higher value, while we suppose that the other languages have lower ROI.

SELECT 
    title,
    gross,
    budget,
    language,
    gross / budget AS ROI
FROM cinema.films
WHERE 
    CASE 
        WHEN language = 'English' THEN gross / budget > 2 -- English films need ROI > 2
        WHEN language = 'French' THEN gross / budget > 1.5 -- French films need ROI > 1.5
        ELSE gross / budget > 1.3 -- Default for other languages
    END;

Calculating Aggregations

We can also be interested in calculating the number of films with high ROI versus low ROI released in 2015:

SELECT 
    SUM(CASE WHEN gross / budget > 2 THEN 1 ELSE 0 END) AS high_roi_films,
    SUM(CASE WHEN gross / budget BETWEEN 1 AND 2 THEN 1 ELSE 0 END) AS medium_roi_films,
    SUM(CASE WHEN gross / budget < 1 THEN 1 ELSE 0 END) AS low_roi_films
FROM cinema.films
WHERE budget > 0 AND gross > 0 AND release_year = 2015;

Inside the SUM function, it’s like we create a new field with value 1 when the ROI reach that threshold, 0 otherwise. If we take the sum of this new binary variable we obtain the number of films that respect that criteria.

From the output, we can notice that most films have an ROI of less than 1, while few films are highly profitable films.

Nested Case Statement

One of the most advanced use cases is the nested case statement, which is a case into a case. It’s very useful when you need to divide the data into subgroups and apply a different logic for each subgroup. 

Similarly to the first example, we want to create a new column that should be influenced by the values of ROI. As we have said previously, the ROI for English-language films is high globally compared to the films in other languages. 

For this reason, it would be better to categorize films into ROI groups by considering both ROI and language

SELECT 
    title,
    language,
    gross / budget AS ROI,
    CASE
        WHEN language = 'English' THEN 
            CASE 
                WHEN gross / budget < 2 THEN 'low ROI'
                WHEN gross / budget BETWEEN 2 AND 3 THEN 'medium ROI'
                ELSE 'high ROI'
            END
        ELSE 
            CASE
                WHEN gross / budget < 1 THEN 'low ROI'
                WHEN gross / budget BETWEEN 1 AND 2 THEN 'medium ROI'
                ELSE 'high ROI'
            END
    END AS ROI_group
FROM cinema.films
WHERE budget > 0 AND gross > 0 and release_year = 2006;

In the query, the outer CASE statement is used to evaluate whether the language is English or not. Depending on the language, the inner CASE evaluates ROI: the thresholds are higher for English films and lower for non-English films. This multi-level condition allows to improve the separation of low, medium and high-roi films.

Conclusion

The CASE statement is an indispensable SQL construct to extract meaningful insights from your data. Theory is alternated with practical examples to help you deeply comprehend this powerful conditional expression. If you want to go deeper, take a look at data manipulation in SQL course. It covers the CASE statement, as well as other key SQL aspects, like subqueries and window functions. Finally, you should check the SQL Fundamentals to have an overview of courses that help to master SQL.


Eugenia Anello's photo
Author
Eugenia Anello
LinkedIn

Data Scientist - CRIF

Topics

Top SQL Courses

track

SQL Fundamentals

26hrs hr
Master the SQL fundamentals needed for business, learn how to write SQL queries, and start analyzing your data using this powerful language.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

tutorial

CASE Statements in PostgreSQL

In this tutorial, you'll learn how to write conditional queries in PostgreSQL using the PostgreSQL CASE conditional expression.
Sayak Paul's photo

Sayak Paul

7 min

tutorial

Aggregate Functions in SQL

Learn how to use aggregate functions for summarizing results and gaining useful insights about data in SQL.
Sayak Paul's photo

Sayak Paul

9 min

tutorial

Introduction to the ALTER TABLE Statement in SQL

In this tutorial, you will learn how to use and apply the ALTER TABLE statement in SQL.
Sayak Paul's photo

Sayak Paul

7 min

tutorial

SQL Tutorial: How To Write Better Queries

Learn about anti-patterns, execution plans, time complexity, query tuning, and optimization in SQL.
Karlijn Willems's photo

Karlijn Willems

35 min

tutorial

Using ORDER BY Keyword in SQL

In this tutorial, you will learn how to use and apply the ORDER BY keyword in SQL.
Sayak Paul's photo

Sayak Paul

4 min

code-along

Getting Started in SQL

Learn how to write basic queries in SQL and find answers to business questions.
Kelsey McNeillie's photo

Kelsey McNeillie

See MoreSee More