Skip to content

The Mayor of London's office make their data available to the public here. In this project, you will work with a slightly modified version of a dataset containing information about public transport journey volume by transport type.

The data has been loaded into a Snowflake database called TFL with a single table called JOURNEYS, including the following data:

TFL.JOURNEYS

ColumnDefinitionData type
MONTHMonth in number format, e.g., 1 equals JanuaryINTEGER
YEARYearINTEGER
DAYSNumber of days in the given monthINTEGER
REPORT_DATEDate that the data was reportedDATE
JOURNEY_TYPEMethod of transport usedVARCHAR
JOURNEYS_MILLIONSMillions of journeys, measured in decimalsFLOAT

Note that in Snowflake all databases, tables, and columns are upper case by default.

You will execute SQL queries to answer three questions...


  1. What are the most popular transport types, measured by the total number of journeys? The output should contain two columns, 1) JOURNEY_TYPE and 2) TOTAL_JOURNEYS_MILLIONS, and be sorted by the second column in descending order. Save the query as most_popular_transport_types
Spinner
DataFrameas
most_popular_transport_types
variable
WITH most_popular_transport_types AS (
    SELECT 
        JOURNEY_TYPE,
        SUM(JOURNEYS_MILLIONS) AS TOTAL_JOURNEYS_MILLIONS
    FROM TFL.JOURNEYS
    GROUP BY JOURNEY_TYPE
    ORDER BY TOTAL_JOURNEYS_MILLIONS DESC
)
SELECT *
FROM most_popular_transport_types;

  1. Which five months and years were the most popular for the Emirates Airline? Return an output containing MONTH, YEAR, and JOURNEYS_MILLIONS, with the latter rounded to two decimal places and aliased as ROUNDED_JOURNEYS_MILLIONS. Exclude null values and save the result as emirates_airline_popularity
Spinner
DataFrameas
emirates_airline_popularity
variable
WITH emirates_airline_popularity AS (
	SELECT 
		MONTH,
    	YEAR,
    	ROUND(JOURNEYS_MILLIONS, 2) AS ROUNDED_JOURNEYS_MILLIONS
	FROM TFL.JOURNEYS
	WHERE JOURNEY_TYPE = 'Emirates Airline'
  		AND JOURNEYS_MILLIONS IS NOT NULL
	ORDER BY ROUNDED_JOURNEYS_MILLIONS DESC
	LIMIT 5
)
SELECT  *
FROM emirates_airline_popularity;

  1. Find the five years with the lowest volume of Underground & DLR journeys, saving as least_popular_years_tube. The results should contain the columns YEAR, JOURNEY_TYPE, and TOTAL_JOURNEYS_MILLIONS.
Spinner
DataFrameas
least_popular_years_tube
variable
WITH least_popular_years_tube AS
(
	SELECT 
    	YEAR,
    	JOURNEY_TYPE,
    	SUM(JOURNEYS_MILLIONS) AS TOTAL_JOURNEYS_MILLIONS
	FROM TFL.JOURNEYS
	WHERE JOURNEY_TYPE = 'Underground & DLR'
	GROUP BY YEAR, JOURNEY_TYPE
	ORDER BY TOTAL_JOURNEYS_MILLIONS ASC
	LIMIT 5
)
SELECT *
FROM least_popular_years_tube;