PostgreSQL Summary Stats and Window Functions
Here you can access the summer_medals table used in the course. To access the table, you will need to specify the medals schema in your queries (e.g., medals.summer_medals).
Summer olympics dataset
- Each row represents a medal awarded in the Summer Olympics games
Explore Datasets
Use the summer_medals table to explore the data and practice your skills!
- Select the
athlete,event, andyearfrom thesummer_medalstable.- Add another column,
previous_winner, which contains the previous winner of the same event. - Filter your results for gold medalists.
- Add another column,
- Return the
year, total number of medalists per year, and running total number of medalists in the history of the Summer Olympics.- Order your results by year in ascending order.
- Return the
country,year, and the number of gold medals earned.- Limit your results to the years 2004, 2008, and 2012.
- Each country should have a subtotal of all gold medals earned across the three years.
SELECT *
FROM medals.summer_medals
LIMIT 5;Introduction to window functions
Course outline:
- Introduction to window functions
- Fetching, ranking, and paging, bin values in rows
- Aggregate window functions and frames
- Beyond window functions
Window functions perform an operation across a set of rows that are somehow related to the current row. They're similar to GROUP BY aggregate functions, but all rows remain in the output (instead of being grouped in a single row).
Uses:
- Fetching values from preceding or following rows (e.g fetching the previous row's value)
- Assigning ordinal ranks (1st, 2nd, etc.) to rows based on their values' positions in a sorted list
- Running totals, moving averages
-- The most basic thing you can do with window functions is assign row numbers. They allow you to reference a row by its posiiton or index as opposed to its values.
--Eg. This:
SELECT year, event, country
FROM medals.summer_medals
WHERE medal = 'Gold';-- Can be changed into this:
SELECT year, event, country,
ROW_NUMBER() OVER() AS row_n
FROM medals.summer_medals
WHERE medal = 'Gold';Anatomy of a window function
FUNCTION_NAME() OVER(...)
- The OVER clause indicates that a function is a window function
- The parenthesis after OVER() can contain subclauses, such as ORDER BY, PARTITION BY, and ROWS/RANGE, PRECEDING/FOLLOWING/UNBOUNDED
-- Querying the table to easiloy find in which year the 13th summer olympics were held
SELECT
year,
-- Assign numbers to each year
ROW_NUMBER() OVER() AS row_n
FROM (
SELECT DISTINCT(year)
FROM medals.summer_medals
ORDER BY year ASC
) AS years
ORDER BY Year ASC;
ORDER BY
- One of the subclauses within the OVER clause
- ORDER BY in OVER orders the rows related to the current row that the window function will use. E.g. ordering by year in descending order in ROW_NUMBER's OVER clause will assign 1 to the most recent year's rows
- You can order by multiple columns in the OVER clause, just like normal, and they'll change the numbers assign to each row
- You can order both inside and outside the OVER clause at the same time. First, ROW_NUMBER will assign numbers based on the order within OVER. After that, the ORDER outside of OVER takes over, and sorts the results of the table by that. The first row on the result won't necessarily be row_number 1.
- ORDER BY inside OVER takes effect before ORDER BY outside OVER.
SELECT year, country, event,
ROW_NUMBER() OVER(ORDER BY year DESC, event DESC) as row_n
-- the row numbers are given after sorting the table by year and event
FROM medals.summer_medals
WHERE medal = 'Gold'
ORDER BY country, row_n;
-- the results of the table will now be sorted by country and row number. SO the first row in the result isn't the row with number 1, because the two orders are based on different columns.-- Ranking olympic medalists by medals earned
WITH athlete_medals AS(
SELECT
-- Count the number of medals each athlete has earned
athlete,
count(medal) AS Medals
FROM medals.Summer_Medals
GROUP BY athlete)
SELECT
-- Number each athlete by how many medals they've earned
athlete,
ROW_NUMBER() OVER(ORDER BY medals DESC) AS row_n
FROM athlete_medals
ORDER BY Medals DESC;Using ORDER BY to determine reigning champions (a champion who's won both the previous and current year's competitions).
The previous and current year's champions need to be in the same row (in two different columns) so that they can be compared.
How can you get a previous value without complex self-joins? Using LAG.
LAG(column, n) OVER(...)returnscolumn's value at the rownrows before the current row.LAG(column, 1) OVER(...)returns the previous row's value
-- Step 1) getting each year's champions and wrapping it in a CTE to temporarily store a query's results as a table
WITH discus_gold AS(
SELECT year, country AS champion
FROM medals. summer_medals
WHERE year IN (1996, 2000, 2004, 2008, 2012)
AND gender = 'Men' AND medal = 'Gold' AND event = 'Discus Throw')
-- STEP 2) Using the previous CTE with LAG
SELECT year, champion,
LAG(champion, 1) OVER (ORDER BY year) AS last_champion
FROM discus_gold
ORDER BY year;