(SQL) Project: Exploring London's Travel Network

London, or as the Romans called it "Londonium"! Home to over 8.5 million residents who speak over 300 languages. While the City of London is a little over one square mile (hence its nickname "The Square Mile"), Greater London has grown to encompass 32 boroughs spanning a total area of 606 square miles!

Given the city's roads were originally designed for horse and cart, this area and population growth has required the development of an efficient public transport system! Since the year 2000, this has been through the local government body called Transport for London, or TfL, which is managed by the London Mayor's office. Their remit covers the London Underground, Overground, Docklands Light Railway (DLR), buses, trams, river services (clipper and Emirates Airline cable car), roads, and even taxis.

The Mayor of London's office make their data available to the public here. In this project, you will work with a slightly modified version of a dataset containing information about public transport journey volume by transport type.

The data has been loaded into a Snowflake database called TFL with a single table called JOURNEYS, including the following data:

TFL.JOURNEYS

Column	Definition	Data type
`MONTH`	Month in number format, e.g., `1` equals January	`INTEGER`
`YEAR`	Year	`INTEGER`
`DAYS`	Number of days in the given month	`INTEGER`
`REPORT_DATE`	Date that the data was reported	`DATE`
`JOURNEY_TYPE`	Method of transport used	`VARCHAR`
`JOURNEYS_MILLIONS`	Millions of journeys, measured in decimals	`FLOAT`

Note that in Snowflake all databases, tables, and columns are upper case by default.

DataFrameas

df

variable

-- most_popular_transport_types
SELECT * FROM TFL.JOURNEYS

1. What are the most popular transport types, measured by the total number of journeys?

The output should contain two columns, 1) journey_type and 2) total_journeys_millions, and be sorted by the second column in descending order.

DataFrameas

most_popular_transport_types

variable

-- most_popular_transport_types
SELECT JOURNEY_TYPE, SUM(JOURNEYS_MILLIONS) AS MILES
FROM TFL.JOURNEYS
GROUP BY JOURNEY_TYPE
HAVING SUM(JOURNEYS_MILLIONS) > 0
ORDER BY 2 DESC

DataFrameas

df8

variable

-- DETAILED above average distance travelled
SELECT MONTH, YEAR, DAYS, REPORT_DATE, JOURNEY_TYPE, JOURNEYS_MILLIONS AS MILES
FROM TFL.JOURNEYS
WHERE JOURNEYS_MILLIONS > (SELECT AVG(JOURNEYS_MILLIONS) FROM TFL.JOURNEYS)
ORDER BY MILES DESC
LIMIT 10

DataFrameas

df13

variable

-- AVG journeys_millions
SELECT AVG(JOURNEYS_MILLIONS) AS AVG_MILES FROM TFL.JOURNEYS
WHERE JOURNEYS_MILLIONS IS NOT NULL

DataFrameas

df17

variable

-- Query 1
SELECT 
    journey_type, AVG(journeys_millions) AS avg_journey_millions
FROM  TFL.JOURNEYS
GROUP BY journey_type
ORDER BY avg_journey_millions DESC;

DataFrameas

df11

variable

-- A2
SELECT 
    journey_type, AVG(journeys_millions) AS avg_journey_millions
FROM  TFL.JOURNEYS
GROUP BY journey_type
HAVING AVG(journeys_millions) > (
        SELECT 
            AVG(journeys_millions)
        FROM 
            TFL.JOURNEYS)
ORDER BY avg_journey_millions DESC;

DataFrameas

df18

variable

-- B1 (best)
-- look also B3 (2nd best)
SELECT JOURNEY_TYPE, AVG(JOURNEYS_MILLIONS) AS AVG_MILES
FROM TFL.JOURNEYS
WHERE JOURNEYS_MILLIONS > (SELECT AVG(JOURNEYS_MILLIONS) FROM TFL.JOURNEYS)
GROUP BY JOURNEY_TYPE
ORDER BY AVG_MILES DESC

2. Which five months and years were the most popular for the Emirates Airline?

Return an output containing month, year, and journeys_millions, with the latter rounded to two decimal places and aliased as rounded_journeys_millions.

DataFrameas

df2

variable

-- emirates_airline_popularity
SELECT JOURNEY_TYPE, COUNT(JOURNEY_TYPE) AS journey_type_count
FROM TFL.JOURNEYS
GROUP BY JOURNEY_TYPE

DataFrameas

df7

variable

-- emirates_airline_popularity
SELECT month, year, ROUND(MAX(journeys_millions),2) AS rounded_journeys_millions
FROM TFL.JOURNEYS
WHERE journey_type = 'Emirates Airline' AND journeys_millions IS NOT NULL
GROUP BY month, year
ORDER BY rounded_journeys_millions DESC
LIMIT 5

DataFrameas

emirates_airline_popularity

variable

-- emirates_airline_popularity
SELECT month, year, MAX(journeys_millions) AS most_popular 
FROM TFL.JOURNEYS
WHERE journey_type = 'Emirates Airline'
GROUP BY month, year
ORDER BY most_popular DESC
LIMIT 1

3. Find the five years with the lowest volume of Underground & DLR journeys.

The results should contain the columns year, journey_type, and total_journeys_millions.

DataFrameas

df6

variable

-- least_popular_years_tube_based_on_total_number_of_journeys
SELECT year, journey_type, SUM(JOURNEYS_MILLIONS) AS total_journeys_millions
FROM TFL.JOURNEYS
WHERE journey_type = 'Underground & DLR'
GROUP BY year, journey_type
ORDER BY total_journeys_millions ASC
LIMIT 5

‌
‌
‌

(SQL) Project: Exploring London's Travel Network

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}TFL.JOURNEYS

1. What are the most popular transport types, measured by the total number of journeys?

2. Which five months and years were the most popular for the Emirates Airline?

3. Find the five years with the lowest volume of Underground & DLR journeys.

TFL.JOURNEYS