London, or as the Romans called it "Londonium"! Home to over 8.5 million residents who speak over 300 languages. While the City of London is a little over one square mile (hence its nickname "The Square Mile"), Greater London has grown to encompass 32 boroughs spanning a total area of 606 square miles!
Given the city's roads were originally designed for horse and cart, this area and population growth has required the development of an efficient public transport system! Since the year 2000, this has been through the local government body called Transport for London, or TfL, which is managed by the London Mayor's office. Their remit covers the London Underground, Overground, Docklands Light Railway (DLR), buses, trams, river services (clipper and Emirates Airline cable car), roads, and even taxis.
The Mayor of London's office make their data available to the public here. In this project, you will work with a slightly modified version of a dataset containing information about public transport journey volume by transport type.
The data has been loaded into a Snowflake database called TFL with a single table called JOURNEYS, including the following data:
TFL.JOURNEYS
| Column | Definition | Data type |
|---|---|---|
MONTH | Month in number format, e.g., 1 equals January | INTEGER |
YEAR | Year | INTEGER |
DAYS | Number of days in the given month | INTEGER |
REPORT_DATE | Date that the data was reported | DATE |
JOURNEY_TYPE | Method of transport used | VARCHAR |
JOURNEYS_MILLIONS | Millions of journeys, measured in decimals | FLOAT |
Note that in Snowflake all databases, tables, and columns are upper case by default.
You will execute SQL queries to answer three questions, as listed in the instructions.
-- most_popular_transport_types
SELECT JOURNEY_TYPE,
SUM(JOURNEYS_MILLIONS) AS TOTAL_JOURNEYS_MILLIONS
FROM TFL.JOURNEYS
GROUP BY JOURNEY_TYPE
ORDER BY TOTAL_JOURNEYS_MILLIONS DESC;
This query:
- Groups all records by JOURNEY_TYPE (e.g., Bus, Tube, Tram, Emirates Airline, etc.).
- Sums the values in the JOURNEYS_MILLIONS column for each journey type — giving you total journeys in millions for each type.
- Orders the results from highest to lowest based on the total journeys.
This helps answer the question:
- What are the most popular transport types, measured by the total number of journeys?
- When visualized, it becomes clear that the bus is the most popular mode of transport, followed by the underground and DLR, then the overground, TFL rail, tram, and finally, the Emirates Airline.
-- emirates_airline_popularity
SELECT MONTH,
YEAR,
ROUND(JOURNEYS_MILLIONS,2) AS ROUNDED_JOURNEYS_MILLIONS
FROM TFL.JOURNEYS
WHERE JOURNEY_TYPE = 'Emirates Airline' AND ROUNDED_JOURNEYS_MILLIONS IS NOT NULL
ORDER BY ROUNDED_JOURNEYS_MILLIONS DESC
LIMIT 5;Second Query:
- Filters only rows where JOURNEY_TYPE is Emirates Airline.
- Removes rows where JOURNEYS_MILLIONS is null.
- Rounds the journey counts to 2 decimal places.
- Returns the top 5 months and years with the highest Emirates Airline usage.
- The chart reveals that May 2012 had the highest usage of Emirates Airline, followed by June (month 6) and April (month 4). In 2013 and 2015, May was the only month with the highest usage of the airline.
-- least_popular_years_tube
SELECT YEAR,
JOURNEY_TYPE,
SUM(JOURNEYS_MILLIONS) as TOTAL_JOURNEYS_MILLIONS
FROM TFL.JOURNEYS
WHERE JOURNEY_TYPE LIKE '%Underground%'
GROUP BY YEAR, JOURNEY_TYPE
ORDER BY TOTAL_JOURNEYS_MILLIONS
LIMIT 5;- Filters only records where JOURNEY_TYPE includes "Underground".
- Groups the data by each year and journey type combo.
- Calculates the total number of journeys (in millions) per year per journey type.
- Orders the results from lowest to highest total journeys.
- Shows only the 5 years with the lowest total Underground journey counts.
- The chart shows that 2010 had the lowest total underground journey counts, while 2011 recorded the highest.