Blog

How to Use GROUP BY and HAVING in SQL

An intuitive guide for discovering the two most popular SQL commands to aggregate rows of your dataset

Updated Feb 2023 · 6 min read

Read the Spanish version 🇪🇸 of this article.

Aggregation is another name for summarizing your data points to get a single value. For example, calculating the mean or the minimum. Sometimes, aggregating all your data will result in a value that isn't useful.

For example, if you are exploring buying behavior in your store, and the people who come in are a mix of poor students and rich professionals, it will be more informative to calculate the mean spend for those groups separately. That is, you need to aggregate the amount spent, grouped by different customer segments.

This tutorial covers the SQL GROUP BY statement, as well as the HAVING statement that helps you control which rows of data are included in each group.

HAVING is closely related to the WHERE statement, and you may wish to read the Introduction to the WHERE Clause in SQL tutorial first. You must also understand the SELECT and FROM statements, as covered in the SQL Query Examples and Tutorial.

Let’s get started!

Master your data skills with DataCamp

Learn the skills you need at your own pace—from non-coding essentials to data science and machine learning.

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Run and edit the code from this tutorial online

Run Code

Unlock your career with custom learning

Trying to strengthen your data skills? Our AI assistant explores your goals and interests to recommend the perfect content. Start building skills that matter to you and your career.

Discover

Unicorn companies database

We will use the Unicorn Companies Database, which is available on DataLab, DataCamp's AI-enabled data notebook. These companies are called “Unicorn” because they are startup companies with a valuation of over a billion dollars. So, this database contains the data of these Unicorn Companies and is composed of seven tables. For simplicity, we’ll focus on three tables: companies, sales, and product_emissions.

Using SQL `GROUP BY`

GROUP BY is a SQL command commonly used to aggregate the data to get insights from it. There are three phases when you group data:

Split: the dataset is split up into chunks of rows based on the values of the variables we have chosen for the aggregation
Apply: Compute an aggregate function, like average, minimum and maximum, returning a single value
Combine: All these resulting outputs are combined in a unique table. In this way, we’ll have a single value for each modality of the variable of interest.

SQL `GROUP BY` Example 1

We can begin by showing a simple example of GROUP BY. Suppose we want to find the top ten countries with the highest number of Unicorn companies.

SELECT * 
FROM companies

It also would be nice to order the results in decreasing order based on the number of companies

SELECT country, COUNT(*) AS n_companies
FROM companies
GROUP BY country
ORDER BY n_companies DESC
LIMIT 10

Here we have the results. You will probably not be surprised to find the US, China, and India in the ranking. Let’s explain the decision behind this query:

First, notice that we used COUNT(*) to count the rows for each group, which corresponds to the country. In addition, we also used the SQL alias to rename the column into a more explainable name. This is possible by using the keyword AS, followed by the new name. COUNT is covered in more depth in the COUNT() SQL FUNCTION tutorial.
The fields were selected from the table companies, where each row corresponds to a Unicorn company.
After, we need to specify the column name after GROUP BY to aggregate the data based on the country.
ORDER BY is required to visualize the countries in the right order, from the highest number to the lower number of companies.
We limit the results to 10 using LIMIT, which is followed by the number of rows you want in the results.

SQL `GROUP BY` Example 2

Now, we will analyze the table with the sales. For each order number, we have the type of client, the product line, the quantity, the unit price, the total, etc.

This time, we are interested in finding the average price per unit, the total number of orders, and the total gain for each product line:

SELECT 
    product_line,
    AVG(unit_price) AS avg_price,
    SUM(quantity) AS tot_pieces,
    SUM(total) AS total_gain
FROM sales
GROUP BY product_line
ORDER BY total_gain DESC

Instead of counting the number of rows, we have the AVG() function to obtain the average price and the SUM() function to calculate the total number of orders and the total gain for each product line.
As before, we specify the column initially dividing the dataset into chunks. Then the aggregation functions will allow us to obtain a row per each modality of the product line.
This time, ORDER BY is optional. It was included to highlight how the higher total gains are not always proportional to higher average prices or total pieces.

The limitations of `WHERE`

Let’s take the previous example again. Now, we want to put a condition to the query: we only want to filter for the total number of orders higher than 40,000. Let's try the WHERE clause:

SELECT 
    product_line,
    AVG(unit_price) AS avg_price,
    SUM(quantity) AS tot_pieces,
    SUM(total) AS total_gain
FROM sales
WHERE SUM(total) > 40000
GROUP BY product_line
ORDER BY total_gain DESC

This query will return the following error:

This error’s not possible to pass aggregated functions in the WHERE clause. We need a new command to solve this issue.

Using SQL HAVING

Like WHERE, the HAVING clause filters the rows of a table. Whereas WHERE tried to filter the whole table, HAVING filters rows within each of the groups defined by GROUP BY

SQL HAVING Example 1

Here's the previous example again, replacing the word WHERE with HAVING.

SELECT 
    product_line,
    AVG(unit_price) AS avg_price,
    SUM(quantity) AS tot_pieces,
    SUM(total) AS total_gain
FROM sales
GROUP BY product_line
HAVING SUM(total) > 40000
ORDER BY total_gain DESC

This time it will produce three rows. The other product lines didn’t match the criterion, so we passed from six results to three.

What else do you notice from the query? We didn’t pass the column alias to HAVING, but the aggregation of the original field. Are you asking yourself why? You’ll unravel the mystery in the next example.

SQL HAVING Example 2

As the last example, we will use the table called product_emissions, which contains the emission of the products provided by the companies.

This time, we are interested in showing the average product carbon footprint (pcf) for each company that belongs to the industry group “Technology Hardware & Equipment.” Moreover, it would be helpful to see the number of products for each company to understand if there is some relationship between the number of products and the carbon footprint. We also again use HAVING to extract companies with an average carbon footprint of over 100.

SELECT pe.company, count(product_name) AS n_products, avg(carbon_footprint_pcf) AS avg_carbon_footprint_pcf
FROM product_emissions AS pe
WHERE industry_group = 'Technology Hardware & Equipment'
GROUP BY pe.company, industry_group
having avg_carbon_footprint_pcf>100
ORDER BY n_products

An error appeared after trying to use the alias. For the HAVING clause, the new column’s name doesn’t exist, so it won’t be able to filter the query. Let’s correct the request:

SELECT pe.company, count(product_name) AS n_products, avg(carbon_footprint_pcf) AS avg_carbon_footprint_pcf
FROM product_emissions AS pe
WHERE industry_group = 'Technology Hardware & Equipment'
GROUP BY pe.company, industry_group
having avg(carbon_footprint_pcf)>100
ORDER BY n_products

This time, the condition worked, and we can visualize the results from the table. We just learned that column aliases can’t be used in HAVING because this condition is applied before the SELECT. For this reason, it cannot recognize the fields from the new names.

SQL Order of Execution

This is the order of the commands while writing the query:

SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY

But there is a question you need to ask yourself. In what order do SQL commands execute? As humans, we often take for granted that the computer reads and interprets SQL from top to down. But the reality is different from what it might look like. This is the right order of execution:

FROM
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
LIMIT

So, the query processor doesn’t start from SELECT, but it begins by selecting which tables to include, and SELECT is executed after HAVING. This explains why HAVING doesn’t allow the use of ALIAS, while ORDER BY doesn’t have problems with it. In addition to this aspect, this order of execution clarifies the reason why HAVING is used together with GROUP BY to apply conditions on aggregated data, while WHERE cannot.

Take it to the next level

After reading this tutorial, you should have a clear idea of the difference between GROUP BY and HAVING. You can practice on DataLab to master these concepts.

If you want to move to the next level of the SQL learning path, you can take our Intermediate SQL course. If you still need to strengthen your foundations of SQL, you can go back to the Introduction to SQL course to learn about the fundamentals of the language.

Topics

SQL

Data Analysis

Big Data

SQL Courses

Course

Introduction to SQL

2 hr

615.8K

Learn how to create and query relational databases using SQL in just two hours.

See Details

Start Course

Course

Intermediate SQL

4 hr

196.5K

Accompanied at every step with hands-on practice queries, this course teaches you everything you need to know to analyze data using your own SQL code today!

See Details

Start Course

Course

Introduction to SQL Server

4 hr

140.8K

Learn to use SQL Server to perform common data manipulation tasks and master common data manipulation tasks using this database system.

See Details

Start Course

SQL Developer Salaries: Expectations in 2024

In this article, we're going to learn what an SQL developer does, what factors impact SQL developer salaries, how the average SQL developer salary compares to the average salaries of other developer profiles, and how you can increase your salary as an SQL developer.

Elena Kosourova

7 min

Scaling Enterprise Analytics with Libby Duane Adams, Chief Advocacy Officer and Co-Founder of Alteryx

RIchie and Libby explore the differences between analytics and business intelligence, generative AI and its implications in analytics, the role of data quality and governance, Alteryx’s AI platform, data skills as a workplace necessity, and more.

Richie Cotton

43 min

[Radar Recap] Building a Learning Culture for Analytics Functions, with Russell Johnson, Denisse Groenendaal-Lopez and Mark Stern

In the session, Russell Johnson, Chief Data Scientist at Marks & Spencer, Denisse Groenendaal-Lopez, Learning & Development Business Partner at Booking Group, and Mark Stern, VP of Business Intelligence & Analytics at BetMGM will address the importance of fostering a learning environment for driving success with analytics.

Adel Nehme

41 min

[Radar Recap] From Data Governance to Data Discoverability: Building Trust in Data Within Your Organization with Esther Munyi, Amy Grace, Stefaan Verhulst and Malarvizhi Veerappan

Esther Munyi, Amy Grace, Stefaan Verhulst and Malarvizhi Veerappan focus on strategies for improving data quality, fostering a culture of trust around data, and balancing robust governance with the need for accessible, high-quality data.

Richie Cotton

39 min

[Radar Recap] Scaling Data ROI: Driving Analytics Adoption Within Your Organization with Laura Gent Felker, Omar Khawaja and Tiffany Perkins-Munn

Laura, Omar and Tiffany explore best practices when it comes to scaling analytics adoption within the wider organization

Richie Cotton

40 min

50 Years of SQL with Don Chamberlin, Computer Scientist and Co-Inventor of SQL

Richie and Don explore the early development of SQL, the commercialization and adoption of SQL, how it became standardized, how it evolved and spread via open source, the future of SQL through NoSQL and SQL++ and much more.

Richie Cotton

36 min

See More See More

Master your data skills with DataCamp

Unlock your career with custom learning

Unicorn companies database

Using SQL GROUP BY

SQL GROUP BY Example 1

SQL GROUP BY Example 2

The limitations of WHERE

Using SQL HAVING

SQL HAVING Example 1

SQL HAVING Example 2

SQL Order of Execution

Take it to the next level

SQL Developer Salaries: Expectations in 2024

Scaling Enterprise Analytics with Libby Duane Adams, Chief Advocacy Officer and Co-Founder of Alteryx

[Radar Recap] Building a Learning Culture for Analytics Functions, with Russell Johnson, Denisse Groenendaal-Lopez and Mark Stern

[Radar Recap] From Data Governance to Data Discoverability: Building Trust in Data Within Your Organization with Esther Munyi, Amy Grace, Stefaan Verhulst and Malarvizhi Veerappan

[Radar Recap] Scaling Data ROI: Driving Analytics Adoption Within Your Organization with Laura Gent Felker, Omar Khawaja and Tiffany Perkins-Munn

50 Years of SQL with Don Chamberlin, Computer Scientist and Co-Inventor of SQL

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Introduction to SQL

Intermediate SQL

Introduction to SQL Server

SQL Developer Salaries: Expectations in 2024

Scaling Enterprise Analytics with Libby Duane Adams, Chief Advocacy Officer and Co-Founder of Alteryx

[Radar Recap] Building a Learning Culture for Analytics Functions, with Russell Johnson, Denisse Groenendaal-Lopez and Mark Stern

[Radar Recap] From Data Governance to Data Discoverability: Building Trust in Data Within Your Organization with Esther Munyi, Amy Grace, Stefaan Verhulst and Malarvizhi Veerappan

[Radar Recap] Scaling Data ROI: Driving Analytics Adoption Within Your Organization with Laura Gent Felker, Omar Khawaja and Tiffany Perkins-Munn

50 Years of SQL with Don Chamberlin, Computer Scientist and Co-Inventor of SQL

Using SQL `GROUP BY`

SQL `GROUP BY` Example 1

SQL `GROUP BY` Example 2

The limitations of `WHERE`

Introduction to SQL