Skip to content

London, or as the Romans called it "Londonium"! Home to over 8.5 million residents who speak over 300 languages. While the City of London is a little over one square mile (hence its nickname "The Square Mile"), Greater London has grown to encompass 32 boroughs spanning a total area of 606 square miles!

Given the city's roads were originally designed for horse and cart, this area and population growth has required the development of an efficient public transport system! Since the year 2000, this has been through the local government body called Transport for London, or TfL, which is managed by the London Mayor's office. Their remit covers the London Underground, Overground, Docklands Light Railway (DLR), buses, trams, river services (clipper and Emirates Airline cable car), roads, and even taxis.

The Mayor of London's office make their data available to the public here. In this project, you will work with a slightly modified version of a dataset containing information about public transport journey volume by transport type.

The data has been loaded into a Google BigQuery database called TFL with a single table called JOURNEYS, including the following data:

TFL.JOURNEYS

ColumnDefinitionData type
MONTHMonth in number format, e.g., 1 equals JanuaryINTEGER
YEARYearINTEGER
DAYSNumber of days in the given monthINTEGER
REPORT_DATEDate that the data was reportedDATE
JOURNEY_TYPEMethod of transport usedVARCHAR
JOURNEYS_MILLIONSMillions of journeys, measured in decimalsFLOAT

Note that the table name is upper case* by default.

You will execute SQL queries to answer three questions, as listed in the instructions.

Spinner
DataFrameas
most_popular_transport_types
variable
-- most_popular_transport_types

-- Query to find the most popular transport types based on the total number of journeys
SELECT 
    -- Select the journey type (method of transport used)
    JOURNEY_TYPE,
    
    -- Sum the number of journeys in millions for each journey type
    SUM(JOURNEYS_MILLIONS) AS total_journeys_millions
FROM 
    -- From the TFL.JOURNEYS table where the data is stored
    TFL.JOURNEYS

-- Group the results by the journey type to aggregate journeys for each transport method
GROUP BY 
    JOURNEY_TYPE

-- Order the results by the total number of journeys in descending order (most popular first)
ORDER BY 
    total_journeys_millions DESC;

Spinner
DataFrameas
emirates_airline_popularity
variable
-- emirates_airline_popularity

-- Query to find the five most popular months and years for Emirates Airline
SELECT 
    -- Select the month and year
    MONTH,
    YEAR,
    
    -- Round the number of journeys in millions to two decimal places and alias as 'rounded_journeys_millions'
    ROUND(JOURNEYS_MILLIONS, 2) AS rounded_journeys_millions
FROM 
    -- From the TFL.JOURNEYS table
    TFL.JOURNEYS

-- Filter the data to include only the rows for Emirates Airline (assuming 'Emirates Airline' is a specific journey type)
WHERE 
    JOURNEY_TYPE = 'Emirates Airline'

-- Exclude rows where JOURNEYS_MILLIONS is null
AND JOURNEYS_MILLIONS IS NOT NULL

-- Order the results by the rounded number of journeys in descending order, then by year in ascending order
ORDER BY 
    rounded_journeys_millions DESC,
    YEAR ASC

-- Limit the output to the top 5 rows
LIMIT 5;

Spinner
DataFrameas
least_popular_years_tube
variable
-- least_popular_years_tube

-- Purpose: Find the five years with lowest Underground & DLR journey volumes
-- Output: Returns year, journey_type, and total annual journeys in millions (rounded to 2 decimal places)
-- Table used: TFL.JOURNEYS
-- Order: Ascending by total journeys (lowest first)
-- Limit: 5 records

-- Create a CTE to handle the yearly aggregation
WITH yearly_journeys AS (
    SELECT 
        YEAR,
        JOURNEY_TYPE,
        -- Round to 2 decimal places for cleaner presentation
        ROUND(SUM(JOURNEYS_MILLIONS), 2) as total_journeys_millions
    FROM 
        TFL.JOURNEYS
    WHERE 
        -- Filter specifically for Underground & DLR journeys
        JOURNEY_TYPE = 'Underground & DLR'
    GROUP BY 
        -- Group by both year and journey type to get annual totals
        YEAR,
        JOURNEY_TYPE
)

-- Main query to select and order the results
SELECT 
    YEAR as year,  -- Aliased to lowercase for consistency
    JOURNEY_TYPE as journey_type,  -- Aliased to lowercase for consistency
    total_journeys_millions
FROM 
    yearly_journeys
ORDER BY 
    -- Sort by total journeys ascending to get lowest values first
    total_journeys_millions ASC
LIMIT 5;  -- Return only the 5 years with lowest journey volumes