Skip to content

London, or as the Romans called it "Londonium"! Home to over 8.5 million residents who speak over 300 languages. While the City of London is a little over one square mile (hence its nickname "The Square Mile"), Greater London has grown to encompass 32 boroughs spanning a total area of 606 square miles!

Given the city's roads were originally designed for horse and cart, this area and population growth has required the development of an efficient public transport system! Since the year 2000, this has been through the local government body called Transport for London, or TfL, which is managed by the London Mayor's office. Their remit covers the London Underground, Overground, Docklands Light Railway (DLR), buses, trams, river services (clipper and Emirates Airline cable car), roads, and even taxis.

The Mayor of London's office make their data available to the public here. In this project, you will work with a slightly modified version of a dataset containing information about public transport journey volume by transport type.

The data has been loaded into a Google BigQuery database called TFL with a single table called JOURNEYS, including the following data:

TFL.JOURNEYS

ColumnDefinitionData type
MONTHMonth in number format, e.g., 1 equals JanuaryINTEGER
YEARYearINTEGER
DAYSNumber of days in the given monthINTEGER
REPORT_DATEDate that the data was reportedDATE
JOURNEY_TYPEMethod of transport usedVARCHAR
JOURNEYS_MILLIONSMillions of journeys, measured in decimalsFLOAT

Note that the table name is upper case* by default.

You will execute SQL queries to answer three questions, as listed in the instructions.

Spinner
DataFrameas
most_popular_transport_types
variable
-- Finding the most popular transport types
SELECT 
	journey_type, 
	SUM(journeys_millions) AS total_journeys_millions
FROM 
	TFL.JOURNEYS
GROUP BY 
	journey_type
ORDER BY 
	total_journeys_millions DESC;
Spinner
DataFrameas
emirates_airline_popularity
variable
-- Identifying the most popular months and years for Emirates Airline cable car travel
SELECT
	month, 
	year, 
	ROUND(journeys_millions, 2) AS rounded_journeys_millions
FROM 
	TFL.JOURNEYS
WHERE	
	journey_type = "Emirates Airline" AND journeys_millions IS NOT NULL
ORDER BY
	rounded_journeys_millions DESC, year ASC
LIMIT 5;
Spinner
DataFrameas
least_popular_years_tube
variable
-- Least popular years for the tube
SELECT
	year,
	journey_type,
	SUM(journeys_millions) AS total_journeys_millions
FROM 
	TFL.JOURNEYS
WHERE	
	journey_type = "Underground & DLR" 
GROUP BY
	year,
	journey_type
ORDER BY
	total_journeys_millions
LIMIT 5;