Skip to content

London, or as the Romans called it "Londonium"! Home to over 8.5 million residents who speak over 300 languages. While the City of London is a little over one square mile (hence its nickname "The Square Mile"), Greater London has grown to encompass 32 boroughs spanning a total area of 606 square miles!

Given the city's roads were originally designed for horse and cart, this area and population growth has required the development of an efficient public transport system! Since the year 2000, this has been through the local government body called Transport for London, or TfL, which is managed by the London Mayor's office. Their remit covers the London Underground, Overground, Docklands Light Railway (DLR), buses, trams, river services (clipper and Emirates Airline cable car), roads, and even taxis.

The Mayor of London's office make their data available to the public here. In this project, you will work with a slightly modified version of a dataset containing information about public transport journey volume by transport type.

The data has been loaded into a Snowflake database called TFL with a single table called JOURNEYS, including the following data:

TFL.JOURNEYS

ColumnDefinitionData type
MONTHMonth in number format, e.g., 1 equals JanuaryINTEGER
YEARYearINTEGER
DAYSNumber of days in the given monthINTEGER
REPORT_DATEDate that the data was reportedDATE
JOURNEY_TYPEMethod of transport usedVARCHAR
JOURNEYS_MILLIONSMillions of journeys, measured in decimalsFLOAT

Note that in Snowflake all databases, tables, and columns are upper case by default.

You will execute SQL queries to answer three questions, as listed in the instructions.

Spinner
DataFrameas
df
variable
SELECT *
FROM TFL.JOURNEYS
LIMIT 5;
Hidden output
indexMONTHYEARDAYSREPORT_DATEJOURNEY_TYPEJOURNEYS_MILLIONS
012010312010-06-30T00:00:00.000Underground & DLR96.83639076
122010282010-06-30T00:00:00.000Underground & DLR90.33050361
232010312010-06-30T00:00:00.000Underground & DLR90.03801412
342010302010-09-30T00:00:00.000Underground & DLR92.5440925
452010312010-09-30T00:00:00.000Underground & DLR88.66291085

What are the most popular transport types, measured by the total number of journeys?

The output should contain two columns,

  1. JOURNEY_TYPE and;
  2. TOTAL_JOURNEYS_MILLIONS,

and be sorted by the second column in descending order. Save the query as most_popular_transport_types.

Spinner
DataFrameas
most_popular_transport_types
variable
-- most_popular_transport_types
SELECT TFL.JOURNEYS.JOURNEY_TYPE, SUM(TFL.JOURNEYS.JOURNEYS_MILLIONS) AS TOTAL_JOURNEYS_MILLIONS
FROM TFL.JOURNEYS
GROUP BY TFL.JOURNEYS.JOURNEY_TYPE
ORDER BY TOTAL_JOURNEYS_MILLIONS DESC;
Hidden output
indexJOURNEY_TYPETOTAL_JOURNEYS_MILLIONS
0Bus24905.19394699
1Underground & DLR15020.466543504
2Overground1666.8456664279
3TfL Rail411.3134209833
4Tram314.6898754821
5Emirates Airline14.5837175749

Which five months and years were the most popular for the Emirates Airline?

Return an output containing MONTH, YEAR, and JOURNEYS_MILLIONS, with the latter rounded to two decimal places and aliased as ROUNDED_JOURNEYS_MILLIONS. Exclude null values and save the result as emirates_airline_popularity.

Spinner
DataFrameas
emirates_airline_popularity
variable
-- emirates_airline_popularity
SELECT MONTH, YEAR, ROUND(TFL.JOURNEYS.JOURNEYS_MILLIONS, 2) AS ROUNDED_JOURNEYS_MILLIONS
FROM TFL.JOURNEYS
WHERE TFL.JOURNEYS.JOURNEY_TYPE = 'Emirates Airline' AND
	MONTH IS NOT NULL AND
	YEAR IS NOT NULL AND
	TFL.JOURNEYS.JOURNEYS_MILLIONS IS NOT NULL
ORDER BY ROUNDED_JOURNEYS_MILLIONS DESC
LIMIT 5;
Hidden output
indexMONTHYEARROUNDED_JOURNEYS_MILLIONS
0520120.53
1620120.38
2420120.24
3520130.19
4520150.19

Find the five years with the lowest volume of Underground & DLR journeys,

saving as least_popular_years_tube. The results should contain the columns YEAR, JOURNEY_TYPE, and TOTAL_JOURNEYS_MILLIONS.

Spinner
DataFrameas
least_popular_years_tube
variable
-- least_popular_years_tube
SELECT YEAR, TFL.JOURNEYS.JOURNEY_TYPE, SUM(TFL.JOURNEYS.JOURNEYS_MILLIONS) AS TOTAL_JOURNEYS_MILLIONS
FROM TFL.JOURNEYS
WHERE TFL.JOURNEYS.JOURNEY_TYPE = 'Underground & DLR'
GROUP BY TFL.JOURNEYS.YEAR, TFL.JOURNEYS.JOURNEY_TYPE
ORDER BY TOTAL_JOURNEYS_MILLIONS
LIMIT 5;
Hidden output
indexYEARJOURNEY_TYPETOTAL_JOURNEYS_MILLIONS
02020Underground & DLR310.179316314
12021Underground & DLR748.4525442
22022Underground & DLR1064.8590086
32010Underground & DLR1096.14558838
42011Underground & DLR1156.64765448