GROUP BY and HAVING Clause in SQL
An important component for Analyst to summarize the data such as sales, profit, cost, and salary. Data Summarization is very helpful for Analyst to create a visualization, conclude findings, and report writing. In SQL, GROUP BY Clause is one of the tools to summarize or aggregate the data series. For example, sum up the daily sales and combine in a single quarter and show it to the senior management. Similarly, if you want to count how many employees in each department of the company. It groups the databases on the basis of one or more column and aggregates the results.
After Grouping the data, you can filter the grouped record using HAVING Clause. HAVING Clause returns the grouped records which match the given condition. You can also sort the grouped records using ORDER BY. ORDER BY used after GROUP BY on aggregated column.
In this tutorial, you are going to learn GROUP BY Clause in detail with relevant examples. Here is the list of topics that you will learn in this tutorial:
- Group By Clause
- Having Clause
- Aggregate Functions
- Compare Having and Where Clause in SQL
- GROUP BY With JOIN Example
- GROUP BY Comparison with other Clause
- Hands-on Practice Assignment
Group By Clause
The GROUP BY Clause is utilized in SQL with the SELECT statement to organize similar data into groups. It combines the multiple records in single or more columns using some functions. Generally, these functions are aggregate functions such as min(),max(),avg(), count(), and sum() to combine into single or multiple columns. It uses the split-apply-combine strategy for data analysis.
- In the
split phase, It divides the groups with its values.
- In the
apply phase, It applies the aggregate function and generates a single value.
- In the
combiner phase, It combines the groups with single values into a single value.
Points to Remember:
- GROUP BY Clause is utilized with the SELECT statement.
- GROUP BY aggregates the results on the basis of selected column: COUNT, MAX, MIN, SUM, AVG, etc.
- GROUP BY returns only one result per group of data.
- GROUP BY Clause always follows the WHERE Clause.
- GROUP BY Clause always precedes the ORDER BY Clause(http://slideplayer.com/slide/15440670/).
In above example, Table is grouped based on the DeptID column and Salary is aggregated department-wise.
HAVING Clause utilized in SQL as a conditional Clause with GROUP BY Clause. This conditional clause returns rows where aggregate function results matched with given conditions only. It added in the SQL because WHERE Clause cannot be combined with aggregate results, so it has a different purpose. The primary purpose of the WHERE Clause is to deal with non-aggregated or individual records.
- HAVING Clause always utilized in combination with GROUP BY Clause.
- HAVING Clause restricts the data on the group records rather than individual records.
- WHERE and HAVING can be used in a single query.
In above example, Table is grouped based on DeptID column and these grouped rows filtered using HAVING Clause with condition AVG(Salary) > 3000.
Aggregate functions used to combine the result of a group into a single such as COUNT, MAX, MIN, AVG, SUM, STDDEV, and VARIANCE. These functions also known as multiple-row functions.
SUM(): Returns the sum or total of each group.
COUNT(): Returns the number of rows of each group.
AVG(): Returns the average and mean of each group.
MIN(): Returns the minimum value of each group.
MAX(): Returns the minimum value of each group.
Compare Having and Where Clause in SQL
- In some cases, you need to filter out the individual records. In such cases, you can use WHERE Clause, Whereas in other cases you need to filter the groups with the specific condition. In such cases, you can use HAVING Clause.
- WHERE Clause filters the records tuple by tuple while HAVING Clause filters the whole group.
- A query may have both the clauses( WHERE and HAVING Clause).
- Where Clause applied first and then Having Clause.
- WHERE Clause restricts records before GROUP BY Clause, whereas HAVING Clause restricts groups after GROUP BY Clause are performed.
- WHERE Clause can be utilized with SELECT, UPDATE, DELETE, and INSERT, whereas HAVING can be utilized only with SELECT statement.
GROUP BY With JOIN Example
The normalized relational database breaks down the complex table into small tables, which helps you to eliminate the data redundancy, inconsistency and ensure there is no loss of information. Normalized tables require joining data from multiple tables.
In above example, Employee and Department are joined using the common column DeptID.
In the above example, JOIN and GROUP BY both clauses used together in a single query. After joining both tables(Employee and Department), joined table grouped by Department name.
GROUP BY Comparison with Other Clause
Compare GROUP BY and DISTINCT
DISTINCT returns the unique values present in the column while GROUP BY returns unique/distinct items with the aggregate resultant column. In the following example you can see the DISTINCT values in the
Compare GROUP BY and ORDER BY
ORDER BY returns sorted items in ascending and descending order while GROUP BY returns unique items with the aggregate resultant column. In the following example, you can see the ORDER BY or sorted salary table.
Hands-on Practice Assignment
Table Name: Books
Columns: ISBN, Title, Publication Date, Price, Publisher
Write SQL queries for the following statements and share your answers in comments:
- Determine how many books are in each category.
- Determine how many books are in the Management category.
- Determine the average book price of each category.
- List the price of the least expensive book in each category.
Source: This Assignment is inspired from the book "Oracle 11g SQL" by John Casteel.
Congratulations, you have made it to the end of this tutorial!
In this tutorial, you have covered a lot of details about the GROUP BY and HAVING Clause. You have learned what the GROUP BY and HAVING Clause are with examples, Comparison between HAVING and WHERE Clause in SQL, GROUP BY with JOIN, and GROUP BY Comparison with DISTINCT and ORDER BY. In the last section, you have a Hands-on practice assignment to assess your knowledge.
Hopefully, you can now utilize GROUP BY and HAVING Clause concept to analyze your own datasets. Thanks for reading this tutorial!
If you are interested in learning more about SQL, take DataCamp's Intermediate SQL course.