PostgreSQL GROUP BY

The GROUP BY clause in PostgreSQL is used to arrange identical data into groups. It is often used with aggregate functions like COUNT, SUM, AVG, MAX, or MIN to perform operations on each group of data.

Usage

The GROUP BY clause is used when you need to organize rows that have the same values in specified columns into summary rows. It follows the SELECT statement and any aggregate functions to determine how the data should be grouped.

SELECT column1, aggregate_function(column2)
FROM table_name
GROUP BY column1;

In this syntax, GROUP BY column1 groups the result set by the values in column1.

Examples

1. Basic Grouping

SELECT department, COUNT(*)
FROM employees
GROUP BY department;

This example counts the number of employees in each department by grouping the results based on the department column.

2. Grouping with SUM

SELECT department, SUM(salary)
FROM employees
GROUP BY department;

This query calculates the total salary paid per department, grouping the rows by department and summing the salary column.

3. Grouping with Multiple Columns

SELECT department, job_title, AVG(salary)
FROM employees
GROUP BY department, job_title;

Here, the query groups the data by both department and job_title, calculating the average salary for each unique combination of department and job title.

4. Grouping with HAVING

SELECT department, COUNT(*)
FROM employees
GROUP BY department
HAVING COUNT(*) > 10;

This example filters the grouped data to include only those departments with more than 10 employees using the HAVING clause.

Tips and Best Practices

Select only necessary columns. Use only the columns you intend to group by or aggregate to ensure efficient query execution.
Order your results. Combine GROUP BY with ORDER BY to sort your aggregated results for easier analysis. Note that ORDER BY is processed after GROUP BY, which can affect performance and results.
Use HAVING for filtering groups. Use the HAVING clause to filter groups after the aggregation stage, as opposed to WHERE which filters rows before aggregation.
Ensure correct grouping. Always include all non-aggregated columns from the SELECT statement in the GROUP BY clause to avoid errors.
Consider performance with large datasets. Be aware that using GROUP BY with large datasets can impact performance. Consider indexing the columns used in GROUP BY to optimize query execution.