Skip to main content

ROW_NUMBER SQL Function: How to Display Row Numbers

The ROW_NUMBER() SQL function assigns sequential integers to rows within a result set, optionally partitioning the data and ordering the rows within each partition.
Jun 12, 2024  · 6 min read

In SQL, it’s common for datasets to be unordered, which can make the analysis challenging. To understand how rows relate within a dataset, we can use the ROW_NUMBER() function.

This function assigns sequential numbers to rows within a result set, providing a clear order for further manipulation and analysis. This can be done for the dataset as a whole or for different groups of data within the dataset. 

This article assumes prior knowledge of the fundamentals of SQL. We’ll cover the basics of the commonly used ROW_NUMBER() function and provide examples of increasing difficulty.

ROW_NUMBER() Syntax

Here's the basic syntax for the ROW_NUMBER() function:

ROW_NUMBER() OVER([PARTITION BY value expression, ... ] [ORDER BY order_by_clause])

Let's break down the key components:

  • ROW_NUMBER(): This is the function itself, which generates sequential row numbers.
  • OVER (...): This clause is mandatory for window functions like ROW_NUMBER(). It defines the context in which row numbers are calculated.
  • PARTITION BY value_expression: This optional clause divides the result set into partitions based on the specified column(s) or expression(s). Row numbers are then calculated independently within each partition.
  • ORDER BY order_by_clause: This optional clause specifies the order in which row numbers are assigned within each partition (or the entire result set if no PARTITION BY is used).

To illustrate, here's how we might use ROW_NUMBER() within a broader SQL query:

SELECT Val_1, 
    ROW_NUMBER() OVER(PARTITION BY group_1, ORDER BY number DESC) AS rn
FROM Data;

ROW_NUMBER() Examples

In the following three examples, we’ll use the free DataLab IDE. We’ll use the Employees sample dataset (already incorporated into DataLab), which has the following four columns:

  • first_name: string field
  • last_name: string field
  • gender: string field with two values (“M” or “F”)
  • hire_date: the date the employee was hired

We can query the dataset using the following SQL code:

SELECT e.first_name, e.last_name, e.gender, e.hire_date
    FROM employees.employees e
LIMIT 100; -- Optionally reduce the size of the output

Sample from the Employees dataset

Numbering employees in alphabetical order

Before using ROW_NUMBER(), it's important to define our goal—this will clarify if and how we want to partition and order. In this example, we’d like to order all employees alphabetically. We don’t need a PARTITION BY clause because we order all employees in the dataset. We’ll order customers by their last name (last_name). We’ll name our numbering name_row_number.

SELECT e.first_name, e.last_name, e.gender, e.hire_date,
    ROW_NUMBER() OVER(ORDER BY e.last_name) AS name_row_number    
FROM employees.employees e;

Output of ROW_NUMBER()

To handle ties (employees with the same last name), we can refine the ordering by adding more columns. In the example below, we order first by last_name, and then, in cases where an employee’s last name is the same as someone else's, we’ll order by their first name (first_name).

SELECT e.first_name, e.last_name, e.gender, e.hire_date,
    ROW_NUMBER() OVER(ORDER BY e.last_name, e.first_name) AS name_row_number    
FROM employees.employees e;

ROW_NUMBER() result with multiple ORDER BY columns

Numbering employees by newest hire date within genders

Now, let's order employees from newest to oldest hire date within their respective genders. We'll again use the ORDER BY clause to sort by hire_date, but this time in descending order (using DESC) to prioritize the most recent hires.

To achieve separate numbering for each gender, we'll introduce the PARTITION BY gender clause. This means row numbers will restart from 1 for each distinct gender.

Here's the complete query:

SELECT e.first_name, e.last_name, e.gender, e.hire_date,
    ROW_NUMBER() OVER(PARTITION BY gender ORDER BY hire_date DESC) AS hire_row_number    
FROM employees.employees e;

ROW_NUMBER() example with PARTITION BY()

We could then query this data using a WHERE clause to find the most experienced employee in each gender:

WITH RankedEmployees AS (
    SELECT e.first_name, e.last_name, e.gender, e.hire_date,
        ROW_NUMBER() OVER(PARTITION BY gender ORDER BY hire_date DESC) AS hire_row_number
    FROM employees.employees e
)
SELECT first_name, last_name, gender, hire_date
FROM RankedEmployees
WHERE hire_row_number = 1;

ROW_NUMBER() example with WHERE

Ordering employees by salary within genders using a JOIN

In our final example, we'll rank employees by their salary, considering their gender. To achieve this, we'll join the employees table with the salaries table based on the emp_no column:

SELECT e.first_name, e.last_name, e.gender, e.hire_date, s.salary
    FROM employees.employees e
JOIN employees.salaries s ON e.emp_no = s.emp_no
LIMIT 100;

Data from joined tables

Now, we'll use both PARTITION BY and ORDER BY. We'll partition by gender to have separate rankings for each gender and order by salary in descending order to rank the highest earners first.

Here's the complete query:

SELECT e.first_name, e.last_name, e.gender, e.hire_date, s.salary,
    ROW_NUMBER () OVER(PARTITION BY e.gender ORDER BY s.salary DESC) AS salary_row_number
    FROM employees.employees e
JOIN employees.salaries s ON e.emp_no = s.emp_no
LIMIT 100;

ROW_NUMBER() example

To compare the top salaries for each gender, we can filter the results using a WHERE clause. The query below will return the top 5 earners for each gender, ordered by their rank within their gender group. Such queries can provide insights into pay equity within the dataset.

WITH RankedSalaries AS (
    SELECT e.first_name, e.last_name, e.gender, e.hire_date, s.salary,
        ROW_NUMBER() OVER(PARTITION BY e.gender ORDER BY s.salary DESC) AS salary_row_number
    FROM employees.employees e
    JOIN employees.salaries s ON e.emp_no = s.emp_no
)
SELECT first_name, last_name, gender, hire_date, salary
FROM RankedSalaries
WHERE salary_row_number <= 5
ORDER BY salary_row_number, gender;

ROW_NUMBER() example

Conclusion

The ROW_NUMBER() function is useful when we have an unordered dataset and want to assign a clear sequential numbering of the rows for further analysis. We define the specific order of these numbers using ORDER BY and define separate numbering sequences for distinct groups within the data using PARTITION BY.

If you found this article useful and want to learn more about SQL, check out our other SQL courses.


Author
Kevin Babitz
LinkedIn

Data Science writer | Senior Technical Marketing Analyst at Wayfair | MSE in Data Science at University of Pennsylvania

Topics

Learn more about SQL with these courses!

course

Data Manipulation in SQL

4 hr
226K
Master the complex SQL queries necessary to answer a wide variety of data science questions and prepare robust data sets for analysis in PostgreSQL.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

cheat-sheet

SQL Window Functions Cheat Sheet

With this SQL Window Functions cheat sheet, you'll have a handy reference guide to the various types of window functions in SQL.

Richie Cotton

10 min

tutorial

COUNT() SQL FUNCTION

COUNT() lets you count the number of rows that match certain conditions. Learn how to use it in this tutorial.
Travis Tang 's photo

Travis Tang

3 min

tutorial

Aggregate Functions in SQL

Learn how to use aggregate functions for summarizing results and gaining useful insights about data in SQL.
Sayak Paul's photo

Sayak Paul

9 min

tutorial

Understanding the LAG() Function in SQL: A Comprehensive Guide

Explore how the LAG() function allows you to access previous rows in your dataset, enabling time series analysis and comparisons of sequential observations.
Islam Salahuddin's photo

Islam Salahuddin

11 min

tutorial

Rank Formula in Excel: A Comprehensive Guide With Examples

Learn how to rank data in Excel with RANK(), RANK.EQ(), and RANK.AVG() functions. Understand their differences, applications, and tips for accurate data analysis.
Laiba Siddiqui's photo

Laiba Siddiqui

30 min

tutorial

FORMAT() SQL FUNCTION

FORMAT() is one of the most commonly used functions in SQL. Learn its main applications in this tutorial.
Travis Tang 's photo

Travis Tang

3 min

See MoreSee More