Skip to content
Exploring London's Travel Network (Basic Snowflake SQL)
The Mayor of London's office make their data available to the public here. In this project, you will work with a slightly modified version of a dataset containing information about public transport journey volume by transport type.
The data has been loaded into a Snowflake database called TFL with a single table called JOURNEYS, including the following data:
TFL.JOURNEYS
| Column | Definition | Data type |
|---|---|---|
MONTH | Month in number format, e.g., 1 equals January | INTEGER |
YEAR | Year | INTEGER |
DAYS | Number of days in the given month | INTEGER |
REPORT_DATE | Date that the data was reported | DATE |
JOURNEY_TYPE | Method of transport used | VARCHAR |
JOURNEYS_MILLIONS | Millions of journeys, measured in decimals | FLOAT |
Note that in Snowflake all databases, tables, and columns are upper case by default.
You will execute SQL queries to answer three questions...
- What are the most popular transport types, measured by the total number of journeys? The output should contain two columns, 1)
JOURNEY_TYPEand 2)TOTAL_JOURNEYS_MILLIONS, and be sorted by the second column in descending order. Save the query asmost_popular_transport_types
DataFrameas
most_popular_transport_types
variable
WITH most_popular_transport_types AS (
SELECT
JOURNEY_TYPE,
SUM(JOURNEYS_MILLIONS) AS TOTAL_JOURNEYS_MILLIONS
FROM TFL.JOURNEYS
GROUP BY JOURNEY_TYPE
ORDER BY TOTAL_JOURNEYS_MILLIONS DESC
)
SELECT *
FROM most_popular_transport_types;- Which five months and years were the most popular for the Emirates Airline? Return an output containing
MONTH,YEAR, andJOURNEYS_MILLIONS, with the latter rounded to two decimal places and aliased asROUNDED_JOURNEYS_MILLIONS. Exclude null values and save the result asemirates_airline_popularity
DataFrameas
emirates_airline_popularity
variable
WITH emirates_airline_popularity AS (
SELECT
MONTH,
YEAR,
ROUND(JOURNEYS_MILLIONS, 2) AS ROUNDED_JOURNEYS_MILLIONS
FROM TFL.JOURNEYS
WHERE JOURNEY_TYPE = 'Emirates Airline'
AND JOURNEYS_MILLIONS IS NOT NULL
ORDER BY ROUNDED_JOURNEYS_MILLIONS DESC
LIMIT 5
)
SELECT *
FROM emirates_airline_popularity;- Find the five years with the lowest volume of Underground & DLR journeys, saving as
least_popular_years_tube. The results should contain the columnsYEAR,JOURNEY_TYPE, andTOTAL_JOURNEYS_MILLIONS.
DataFrameas
least_popular_years_tube
variable
WITH least_popular_years_tube AS
(
SELECT
YEAR,
JOURNEY_TYPE,
SUM(JOURNEYS_MILLIONS) AS TOTAL_JOURNEYS_MILLIONS
FROM TFL.JOURNEYS
WHERE JOURNEY_TYPE = 'Underground & DLR'
GROUP BY YEAR, JOURNEY_TYPE
ORDER BY TOTAL_JOURNEYS_MILLIONS ASC
LIMIT 5
)
SELECT *
FROM least_popular_years_tube;