Skip to content
PostgreSQL Summary Stats and Window Functions
  • AI Chat
  • Code
  • Report
  • Spinner

    PostgreSQL Summary Stats and Window Functions

    Here you can access the summer_medals table used in the course. To access the table, you will need to specify the medals schema in your queries (e.g., medals.summer_medals).

    Summer olympics dataset

    • Each row represents a medal awarded in the Summer Olympics games

    Explore Datasets

    Use the summer_medals table to explore the data and practice your skills!

    • Select the athlete, event, and year from the summer_medals table.
      • Add another column, previous_winner, which contains the previous winner of the same event.
      • Filter your results for gold medalists.
    • Return the year, total number of medalists per year, and running total number of medalists in the history of the Summer Olympics.
      • Order your results by year in ascending order.
    • Return the country, year, and the number of gold medals earned.
      • Limit your results to the years 2004, 2008, and 2012.
      • Each country should have a subtotal of all gold medals earned across the three years.
    Unknown integration
    DataFrameavailable as
    df
    variable
    SELECT *
    FROM medals.summer_medals
    LIMIT 5;
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

    Introduction to window functions

    Course outline:

    • Introduction to window functions
    • Fetching, ranking, and paging, bin values in rows
    • Aggregate window functions and frames
    • Beyond window functions

    Window functions perform an operation across a set of rows that are somehow related to the current row. They're similar to GROUP BY aggregate functions, but all rows remain in the output (instead of being grouped in a single row).

    Uses:
    • Fetching values from preceding or following rows (e.g fetching the previous row's value)
    • Assigning ordinal ranks (1st, 2nd, etc.) to rows based on their values' positions in a sorted list
    • Running totals, moving averages
    Unknown integration
    DataFrameavailable as
    df
    variable
    -- The most basic thing you can do with window functions is assign row numbers. They allow you to reference a row by its posiiton or index as opposed to its values.
    --Eg. This:
    SELECT year, event, country
    FROM medals.summer_medals
    WHERE medal = 'Gold';
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Unknown integration
    DataFrameavailable as
    df
    variable
    -- Can be changed into this:
    SELECT year, event, country,
        ROW_NUMBER() OVER() AS row_n
    FROM medals.summer_medals
    WHERE medal = 'Gold';
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Anatomy of a window function

    FUNCTION_NAME() OVER(...)

    • The OVER clause indicates that a function is a window function
    • The parenthesis after OVER() can contain subclauses, such as ORDER BY, PARTITION BY, and ROWS/RANGE, PRECEDING/FOLLOWING/UNBOUNDED
    Unknown integration
    DataFrameavailable as
    df
    variable
    -- Querying the table to easiloy find in which year the 13th summer olympics were held
    
    SELECT
      year,
      -- Assign numbers to each year
      ROW_NUMBER() OVER() AS row_n
    FROM (
      SELECT DISTINCT(year)
      FROM medals.summer_medals
      ORDER BY year ASC
    ) AS years
    ORDER BY Year ASC;
    
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    ORDER BY
    • One of the subclauses within the OVER clause
    • ORDER BY in OVER orders the rows related to the current row that the window function will use. E.g. ordering by year in descending order in ROW_NUMBER's OVER clause will assign 1 to the most recent year's rows
    • You can order by multiple columns in the OVER clause, just like normal, and they'll change the numbers assign to each row
    • You can order both inside and outside the OVER clause at the same time. First, ROW_NUMBER will assign numbers based on the order within OVER. After that, the ORDER outside of OVER takes over, and sorts the results of the table by that. The first row on the result won't necessarily be row_number 1.
    • ORDER BY inside OVER takes effect before ORDER BY outside OVER.
    Unknown integration
    DataFrameavailable as
    df
    variable
    SELECT year, country, event, 
        ROW_NUMBER() OVER(ORDER BY year DESC, event DESC) as row_n
        -- the row numbers are given after sorting the table by year and event
    FROM medals.summer_medals
    WHERE medal = 'Gold'
    ORDER BY country, row_n;
    -- the results of the table will now be sorted by country and row number. SO the first row in the result isn't the row with number 1, because the two orders are based on different columns.
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
    Unknown integration
    DataFrameavailable as
    df
    variable
    -- Ranking olympic medalists by medals earned
    
    WITH athlete_medals AS(
    SELECT
      -- Count the number of medals each athlete has earned
      athlete,
      count(medal) AS Medals
    FROM medals.Summer_Medals
    GROUP BY athlete)
        
    SELECT
        -- Number each athlete by how many medals they've earned
        athlete,
        ROW_NUMBER() OVER(ORDER BY medals DESC) AS row_n
    FROM athlete_medals
    ORDER BY Medals DESC;
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

    Using ORDER BY to determine reigning champions (a champion who's won both the previous and current year's competitions).
    The previous and current year's champions need to be in the same row (in two different columns) so that they can be compared.

    How can you get a previous value without complex self-joins? Using LAG.

    • LAG(column, n) OVER(...) returns column's value at the row n rows before the current row.
      • LAG(column, 1) OVER(...) returns the previous row's value
    Unknown integration
    DataFrameavailable as
    df
    variable
    -- Step 1) getting each year's champions and wrapping it in a CTE to temporarily store a query's results as a table
    
    WITH discus_gold AS(
    SELECT year, country AS champion
    FROM medals. summer_medals
    WHERE year IN (1996, 2000, 2004, 2008, 2012)
        AND gender = 'Men' AND medal = 'Gold' AND event = 'Discus Throw')
    
    -- STEP 2) Using the previous CTE with LAG
    SELECT year, champion, 
        LAG(champion, 1) OVER (ORDER BY year) AS last_champion
    FROM discus_gold
    ORDER BY year;
    This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.