London, or as the Romans called it "Londonium"! Home to over 8.5 million residents who speak over 300 languages. While the City of London is a little over one square mile (hence its nickname "The Square Mile"), Greater London has grown to encompass 32 boroughs spanning a total area of 606 square miles!
Given the city's roads were originally designed for horse and cart, this area and population growth has required the development of an efficient public transport system! Since the year 2000, this has been through the local government body called Transport for London, or TfL, which is managed by the London Mayor's office. Their remit covers the London Underground, Overground, Docklands Light Railway (DLR), buses, trams, river services (clipper and Emirates Airline cable car), roads, and even taxis.
The Mayor of London's office make their data available to the public here. In this project, you will work with a slightly modified version of a dataset containing information about public transport journey volume by transport type.
The data has been loaded into a Google BigQuery database called TFL with a single table called JOURNEYS, including the following data:
TFL.JOURNEYS
| Column | Definition | Data type |
|---|---|---|
MONTH | Month in number format, e.g., 1 equals January | INTEGER |
YEAR | Year | INTEGER |
DAYS | Number of days in the given month | INTEGER |
REPORT_DATE | Date that the data was reported | DATE |
JOURNEY_TYPE | Method of transport used | VARCHAR |
JOURNEYS_MILLIONS | Millions of journeys, measured in decimals | FLOAT |
Note that the table name is upper case* by default.
You will execute SQL queries to answer three questions, as listed in the instructions.
-- most_popular_transport_types
-- Query to find the most popular transport types based on the total number of journeys
SELECT
-- Select the journey type (method of transport used)
JOURNEY_TYPE,
-- Sum the number of journeys in millions for each journey type
SUM(JOURNEYS_MILLIONS) AS total_journeys_millions
FROM
-- From the TFL.JOURNEYS table where the data is stored
TFL.JOURNEYS
-- Group the results by the journey type to aggregate journeys for each transport method
GROUP BY
JOURNEY_TYPE
-- Order the results by the total number of journeys in descending order (most popular first)
ORDER BY
total_journeys_millions DESC;
-- emirates_airline_popularity
-- Query to find the five most popular months and years for Emirates Airline
SELECT
-- Select the month and year
MONTH,
YEAR,
-- Round the number of journeys in millions to two decimal places and alias as 'rounded_journeys_millions'
ROUND(JOURNEYS_MILLIONS, 2) AS rounded_journeys_millions
FROM
-- From the TFL.JOURNEYS table
TFL.JOURNEYS
-- Filter the data to include only the rows for Emirates Airline (assuming 'Emirates Airline' is a specific journey type)
WHERE
JOURNEY_TYPE = 'Emirates Airline'
-- Exclude rows where JOURNEYS_MILLIONS is null
AND JOURNEYS_MILLIONS IS NOT NULL
-- Order the results by the rounded number of journeys in descending order, then by year in ascending order
ORDER BY
rounded_journeys_millions DESC,
YEAR ASC
-- Limit the output to the top 5 rows
LIMIT 5;
-- least_popular_years_tube
-- Purpose: Find the five years with lowest Underground & DLR journey volumes
-- Output: Returns year, journey_type, and total annual journeys in millions (rounded to 2 decimal places)
-- Table used: TFL.JOURNEYS
-- Order: Ascending by total journeys (lowest first)
-- Limit: 5 records
-- Create a CTE to handle the yearly aggregation
WITH yearly_journeys AS (
SELECT
YEAR,
JOURNEY_TYPE,
-- Round to 2 decimal places for cleaner presentation
ROUND(SUM(JOURNEYS_MILLIONS), 2) as total_journeys_millions
FROM
TFL.JOURNEYS
WHERE
-- Filter specifically for Underground & DLR journeys
JOURNEY_TYPE = 'Underground & DLR'
GROUP BY
-- Group by both year and journey type to get annual totals
YEAR,
JOURNEY_TYPE
)
-- Main query to select and order the results
SELECT
YEAR as year, -- Aliased to lowercase for consistency
JOURNEY_TYPE as journey_type, -- Aliased to lowercase for consistency
total_journeys_millions
FROM
yearly_journeys
ORDER BY
-- Sort by total journeys ascending to get lowest values first
total_journeys_millions ASC
LIMIT 5; -- Return only the 5 years with lowest journey volumes