PostgreSQL DISTINCT
The `DISTINCT` keyword in PostgreSQL is used to return unique values from a column or a combination of columns, eliminating duplicate entries. It is often used in conjunction with the `SELECT` statement to ensure the result set contains only distinct records.
Usage
The `DISTINCT` keyword is employed when you need to remove duplicate rows from your query results. It follows the `SELECT` keyword and applies to the columns specified thereafter.
SELECT DISTINCT column1, column2, ...
FROM table_name;
In this syntax, `DISTINCT` ensures that the combination of `column1`, `column2`, etc., is unique in the result set.
Examples
1. Basic Unique Values
SELECT DISTINCT country
FROM customers;
This example retrieves a list of unique countries from the `customers` table, eliminating any duplicate entries. Note that `NULL` values are treated as distinct entries.
2. Unique Combinations
SELECT DISTINCT first_name, last_name
FROM employees;
Here, the query returns unique combinations of `first_name` and `last_name` from the `employees` table.
3. Using DISTINCT with ORDER BY
SELECT DISTINCT category
FROM products
ORDER BY category ASC;
This example retrieves distinct product categories and orders them alphabetically.
4. Usage with DISTINCT ON
SELECT DISTINCT ON (column1) column1, column2
FROM table_name
ORDER BY column1, column2;
In PostgreSQL, `DISTINCT ON` allows you to select distinct rows based on specific columns, which can be useful for more complex scenarios.
Tips and Best Practices
- Use wisely with large datasets. `DISTINCT` can be resource-intensive due to the sorting and comparison operations involved, so use it judiciously to avoid performance issues.
- Combine with `ORDER BY`. Pair `DISTINCT` with `ORDER BY` to sort unique results as needed.
- Limit columns when possible. Apply `DISTINCT` to the smallest number of columns to reduce complexity and execution time.
- Check for necessary uniqueness. Ensure that using `DISTINCT` is truly needed for your specific query to avoid unnecessary overhead.