London, or as the Romans called it "Londonium"! Home to over 8.5 million residents who speak over 300 languages. While the City of London is a little over one square mile (hence its nickname "The Square Mile"), Greater London has grown to encompass 32 boroughs spanning a total area of 606 square miles!
Given the city's roads were originally designed for horse and cart, this area and population growth has required the development of an efficient public transport system! Since the year 2000, this has been through the local government body called Transport for London, or TfL, which is managed by the London Mayor's office. Their remit covers the London Underground, Overground, Docklands Light Railway (DLR), buses, trams, river services (clipper and Emirates Airline cable car), roads, and even taxis.
The Mayor of London's office make their data available to the public here. In this project, you will work with a slightly modified version of a dataset containing information about public transport journey volume by transport type.
The data has been loaded into a Snowflake database called TFL with a single table called JOURNEYS, including the following data:
TFL.JOURNEYS
| Column | Definition | Data type |
|---|---|---|
MONTH | Month in number format, e.g., 1 equals January | INTEGER |
YEAR | Year | INTEGER |
DAYS | Number of days in the given month | INTEGER |
REPORT_DATE | Date that the data was reported | DATE |
JOURNEY_TYPE | Method of transport used | VARCHAR |
JOURNEYS_MILLIONS | Millions of journeys, measured in decimals | FLOAT |
Note that in Snowflake all databases, tables, and columns are upper case by default.
You will execute SQL queries to answer three questions, as listed in the instructions.
-- Explore the data in the table
SELECT *
FROM TFL.JOURNEYS
LIMIT 5;SELECT
summer.JOURNEY_TYPE,
SUM(summer.JOURNEYS_MILLIONS) AS summer_journeys,
SUM(winter.JOURNEYS_MILLIONS) AS winter_journeys
FROM TFL.JOURNEYS AS summer
LEFT JOIN TFL.JOURNEYS AS winter
ON summer.JOURNEY_TYPE = winter.JOURNEY_TYPE
AND winter.MONTH IN (12, 1, 2)
WHERE summer.MONTH IN (6, 7, 8)
GROUP BY summer.JOURNEY_TYPE
ORDER BY summer.JOURNEY_TYPE;-- Seasonal Journey Patterns: summer vs. winter influence of journeys made on different transport systems
SELECT
TFL.JOURNEYS.JOURNEY_TYPE,
CASE
WHEN TFL.JOURNEYS.MONTH IN (6, 7, 8) THEN 'Summer'
WHEN TFL.JOURNEYS.MONTH IN (12, 1, 2) THEN 'Winter'
END AS season,
SUM(TFL.JOURNEYS.JOURNEYS_MILLIONS) AS total_journeys
FROM TFL.JOURNEYS
WHERE TFL.JOURNEYS.MONTH IN (6, 7, 8, 12, 1, 2)
GROUP BY season, TFL.JOURNEYS.JOURNEY_TYPE
ORDER BY season, TFL.JOURNEYS.JOURNEY_TYPE;-- Transport Popularity Over Time: Analyzing the rise or fall in popularity of different transport types (e.g., buses, tube, Emirates Airline) over the years (excluding null values)
SELECT
TFL.JOURNEYS.YEAR,
TFL.JOURNEYS.JOURNEY_TYPE,
SUM(TFL.JOURNEYS.JOURNEYS_MILLIONS) AS TOTAL_JOURNEYS_MILLIONS
FROM TFL.JOURNEYS
GROUP BY TFL.JOURNEYS.YEAR,
TFL.JOURNEYS.JOURNEY_TYPE
HAVING SUM(TFL.JOURNEYS.JOURNEYS_MILLIONS) IS NOT NULL
ORDER BY TFL.JOURNEYS.JOURNEY_TYPE,
TFL.JOURNEYS.YEAR;-- Monthly Trend Analysis: A query to find out which month has the highest average number of journeys across all transport types.
SELECT
TFL.JOURNEYS.MONTH,
SUM(TFL.JOURNEYS.JOURNEYS_MILLIONS) AS TOTAL_JOURNEYS_MILLIONS
FROM TFL.JOURNEYS
GROUP BY TFL.JOURNEYS.MONTH
ORDER BY SUM(TFL.JOURNEYS.JOURNEYS_MILLIONS) DESC;-- most_popular_transport_types
SELECT
TFL.JOURNEYS.JOURNEY_TYPE,
SUM(TFL.JOURNEYS.JOURNEYS_MILLIONS) AS TOTAL_JOURNEYS_MILLIONS
FROM TFL.JOURNEYS
GROUP BY TFL.JOURNEYS.JOURNEY_TYPE
ORDER BY SUM(TFL.JOURNEYS.JOURNEYS_MILLIONS) DESC;-- emirates_airline_popularity
SELECT
TFL.JOURNEYS.MONTH,
TFL.JOURNEYS.YEAR,
ROUND(SUM(TFL.JOURNEYS.JOURNEYS_MILLIONS), 2) AS ROUNDED_JOURNEYS_MILLIONS
FROM TFL.JOURNEYS
WHERE TFL.JOURNEYS.JOURNEYS_MILLIONS IS NOT NULL
AND TFL.JOURNEYS.JOURNEY_TYPE = 'Emirates Airline'
GROUP BY
TFL.JOURNEYS.MONTH,
TFL.JOURNEYS.YEAR
ORDER BY ROUNDED_JOURNEYS_MILLIONS DESC
LIMIT 5;-- least_popular_years_tube
SELECT
TFL.JOURNEYS.YEAR,
TFL.JOURNEYS.JOURNEY_TYPE,
SUM(TFL.JOURNEYS.JOURNEYS_MILLIONS) AS TOTAL_JOURNEYS_MILLIONS
FROM TFL.JOURNEYS
WHERE TFL.JOURNEYS.JOURNEY_TYPE = 'Underground & DLR'
GROUP BY
TFL.JOURNEYS.YEAR,
TFL.JOURNEYS.JOURNEY_TYPE
ORDER BY TOTAL_JOURNEYS_MILLIONS ASC
LIMIT 5;