PostgreSQL GROUP BY
The GROUP BY
clause in PostgreSQL is used to arrange identical data into groups. It is often used with aggregate functions like COUNT
, SUM
, AVG
, MAX
, or MIN
to perform operations on each group of data.
Usage
The GROUP BY
clause is used when you need to organize rows that have the same values in specified columns into summary rows. It follows the SELECT
statement and any aggregate functions to determine how the data should be grouped.
SELECT column1, aggregate_function(column2)
FROM table_name
GROUP BY column1;
In this syntax, GROUP BY column1
groups the result set by the values in column1
.
Examples
1. Basic Grouping
SELECT department, COUNT(*)
FROM employees
GROUP BY department;
This example counts the number of employees in each department by grouping the results based on the department
column.
2. Grouping with SUM
SELECT department, SUM(salary)
FROM employees
GROUP BY department;
This query calculates the total salary paid per department, grouping the rows by department
and summing the salary
column.
3. Grouping with Multiple Columns
SELECT department, job_title, AVG(salary)
FROM employees
GROUP BY department, job_title;
Here, the query groups the data by both department
and job_title
, calculating the average salary for each unique combination of department and job title.
4. Grouping with HAVING
SELECT department, COUNT(*)
FROM employees
GROUP BY department
HAVING COUNT(*) > 10;
This example filters the grouped data to include only those departments with more than 10 employees using the HAVING
clause.
Tips and Best Practices
- Select only necessary columns. Use only the columns you intend to group by or aggregate to ensure efficient query execution.
- Order your results. Combine
GROUP BY
withORDER BY
to sort your aggregated results for easier analysis. Note thatORDER BY
is processed afterGROUP BY
, which can affect performance and results. - Use HAVING for filtering groups. Use the
HAVING
clause to filter groups after the aggregation stage, as opposed toWHERE
which filters rows before aggregation. - Ensure correct grouping. Always include all non-aggregated columns from the
SELECT
statement in theGROUP BY
clause to avoid errors. - Consider performance with large datasets. Be aware that using
GROUP BY
with large datasets can impact performance. Consider indexing the columns used inGROUP BY
to optimize query execution.