Window Functions in SQL

Window functions are used to perform calculations on a group of rows without combining them into a single result. This can be useful when you are calculating metrics such as an average over a period of time or a running total. Unlike a GROUP BY, it does not group rows into a single row. In this workspace, you will learn:

How to define a window function that includes a set of rows
How to divide the set of rows included within a window function into partitions or frames
How to calculate moving averages and running totals.

This workspace uses the Olympics data from the course PostgreSQL Summary Stats and Window Functions. You are free to create an integration to your own data or use one of the existing integrations. You can learn more about integrations here.

Calculating row numbers

The example below uses ROW_NUMBER() to assign a row number to each row. The row numbers are assigned based on the results of the query. The OVER() clause defines the range of rows upon which the calculations would be performed. By default, the OVER() clause will use the entire result of the query.

Note: ROW_NUMBER() assigns a unique number to each row. If you want rows with the same value to receive identical numbers, you can use RANK() or DENSE_RANK(). Refer to this video to learn more about the differences between these two functions.

👇 To run a SQL cell like the one below, click inside the cell to select it and click "Run" or the ► icon. You can also use Shift-Enter to run a selected cell.

Unknown integration

DataFrameavailable as

df

variable

SELECT 
	year, 
    athlete,
    -- Assign a row number to athletes
	ROW_NUMBER() OVER() AS athlete_number
FROM medals.summer_medals
LIMIT 5

This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

Ordering the rows within a window

To define an order, you can use ORDER BY within the OVER() clause. This order is applied to the range of rows defined within the window function. In the example below, row numbers are assigned to each event based on the descending number of tickets sold.

If you want to know more about ordering in window functions, you can check out this video.

Unknown integration

DataFrameavailable as

df

variable

SELECT 
  athlete,
  medal_count, 
  -- Assign row numbers to athletes based on the descending number of medals won
  ROW_NUMBER() OVER(ORDER BY medal_count DESC) as medal_ranking
FROM 
  (
    -- Fetch athlete and the number of medals they won
    SELECT athlete, COUNT(*) AS medal_count
    FROM medals.summer_medals
    GROUP BY athlete
  ) AS medal_counts

This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

Calculating window functions over partitions

OVER() has a clause called PARTITION BY, which divides the range of rows defined within the window function into partitions. If it is not used, the entire result of the query is treated as one partition. The window is divided into partitions based on the country in the example below.

You can learn more about partitions in this video.

Unknown integration

DataFrameavailable as

df

variable

SELECT 
  athlete,
  country,
  medal_count, 
  --Assign row numbers to each country based on the descending number of medals won while partitioning by country
  ROW_NUMBER() OVER(PARTITION BY country ORDER BY medal_count DESC) AS medal_ranking
FROM 
  (
    -- Fetch athlete, country, and the number of medals athlete won
    SELECT 
		athlete,
      	country,
      	COUNT(*) AS medal_count
    FROM medals.summer_medals
    WHERE country IN ('QAT', 'PHI')
    GROUP BY athlete, country
  ) AS medal_counts
ORDER BY country, medal_count DESC

This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

Calculating a moving average

Up until now, ROW_NUMBER() was used to assign numbers to rows. Window functions can also be used to perform aggregations over rows. Let's begin by calculating a moving average. Moving averages take the average value over a previous number of rows and can help eliminate noise from data and identify trends.

To calculate the moving average, we again use OVER(), but this time in combination with the AVG() function. By default, this will calculate the average over all previous rows.

You can find more information about moving averages here.

Unknown integration

DataFrameavailable as

ticket_prices

variable

SELECT 
  year,
  medal_count,
  -- Calculate the moving average
  AVG(medal_count) OVER(ORDER BY year) as average_medal_count 
FROM 
  (
    -- Fetch the year and number of medals won
    SELECT 
      	year, COUNT(*) AS medal_count
    FROM medals.summer_medals
    -- Narrow the results to Canada
    WHERE country = 'CAN'
    GROUP BY year
  ) AS canadian_medals

This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

Adjusting the range of window functions

Frames are used to define the range of rows for a windows function. They are specified within OVER(). The range can be defined by using ROWS BETWEEN [start] and [finish]. The start and finish can be one of the three clauses - PRECEDING, CURRENT ROW, and FOLLOWING.

For example:

ROWS BETWEEN 2 PRECEDING AND CURRENT ROW will define a frame that starts two rows before the current row and ends at the current row.
ROWS BETWEEN 1 PRECEDING AND 3 FOLLOWING will define a frame that starts one row before the current row and ends three rows after the current row.
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW will define a frame that starts from the first row and ends at the current row.

The following query calculates the moving average for the Canadian team's medal count for the past three summer Olympics. You can find more information about frames here.

Unknown integration

DataFrameavailable as

moving_average

variable

SELECT 
  year,
  medal_count,
  -- Calculate moving average over 3 rows
  AVG(medal_count) OVER(ORDER BY year ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as last_three_average_medal_count 
FROM 
  (
    -- Fetch the year and number of medals won
    SELECT 
      	year,
      	COUNT(*) AS medal_count
    FROM medals.summer_medals
    -- Narrow the results to Canada
    WHERE country = 'CAN'
    GROUP BY year
  ) AS canadian_medals

This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

Plotting the moving average

The plot below shows how a moving average removes noise by plotting both the original column and the moving average.

# Import libraries and set plotting style
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_style('darkgrid')
plt.rcParams["figure.figsize"]=(12, 6)

# Create the line plots
sns.lineplot(x='year', y='medal_count', data=moving_average)
sns.lineplot(x='year', y='last_three_average_medal_count', data=moving_average)

# Add labels and show the plot
plt.title('Medals Won by Canada', fontsize=20)
plt.xlabel('Date', fontsize=15) 
plt.ylabel('Number of Medals', fontsize=15)
plt.xticks(rotation ='45')
plt.legend(labels=["Average Medal Count","Medal Count"])
plt.show()

Calculating a running total

A running total calculates the total sum of a column over the preceding values and the current value. It is performed in much the same way as a moving average, except that SUM() is used in place of AVG().

The following query calculates the total number of medals won by Canada over time.

Unknown integration

DataFrameavailable as

running_total

variable

SELECT 
  year,
  medal_count,
  -- Calculate a running total
  SUM(medal_count) OVER(ORDER BY year) as total_medal_count 
FROM 
  (
    -- Fetch the year and number of medals won
    SELECT 
      	year,
      	COUNT(*) AS medal_count
    FROM medals.summer_medals
    -- Narrow the results to Canada
    WHERE country = 'CAN'
    GROUP BY year
  ) AS canadian_medals

This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

‌
‌
‌

Window Functions in SQL

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Window Functions in SQL

Calculating row numbers

Ordering the rows within a window

Calculating window functions over partitions

Calculating a moving average

Adjusting the range of window functions

Plotting the moving average

Calculating a running total

Window Functions in SQL