London, or as the Romans called it "Londonium"! Home to over 8.5 million residents who speak over 300 languages. While the City of London is a little over one square mile (hence its nickname "The Square Mile"), Greater London has grown to encompass 32 boroughs spanning a total area of 606 square miles!
Given the city's roads were originally designed for horse and cart, this area and population growth has required the development of an efficient public transport system! Since the year 2000, this has been through the local government body called Transport for London, or TfL, which is managed by the London Mayor's office. Their remit covers the London Underground, Overground, Docklands Light Railway (DLR), buses, trams, river services (clipper and Emirates Airline cable car), roads, and even taxis.
The Mayor of London's office make their data available to the public here. In this project, you will work with a slightly modified version of a dataset containing information about public transport journey volume by transport type.
The data has been loaded into a Google BigQuery database called TFL with a single table called JOURNEYS, including the following data:
TFL.JOURNEYS
| Column | Definition | Data type |
|---|---|---|
MONTH | Month in number format, e.g., 1 equals January | INTEGER |
YEAR | Year | INTEGER |
DAYS | Number of days in the given month | INTEGER |
REPORT_DATE | Date that the data was reported | DATE |
JOURNEY_TYPE | Method of transport used | VARCHAR |
JOURNEYS_MILLIONS | Millions of journeys, measured in decimals | FLOAT |
Note that the table name is upper case* by default.
You will execute SQL queries to answer three questions, as listed in the instructions.
-- most_popular_transport_types
--Select journey type and sum of journey volumes for each journey type
SELECT journey_type,SUM(journeys_millions) AS total_journeys_millions
--Table name is case sensitive
FROM TFL.JOURNEYS
--We need to group results by journey type
GROUP BY journey_type
--Order results by sum of journey volumes per journey type in descending order
ORDER BY total_journeys_millions DESC;SELECT * FROM TFL.JOURNEYS-- emirates_airline_popularity
--Select month, year and round the sum of all journeys for each month-year pair to 2 decimal places
SELECT month,year, ROUND(SUM(journeys_millions),2) AS rounded_journeys_millions
--Remember that the table name is case sensitive
FROM TFL.JOURNEYS
--Filter journey type for Emirates Airline and exclude null values
WHERE journey_type='Emirates Airline' AND journeys_millions IS NOT NULL
--Group by for each month-year pair
GROUP BY month,year
--Order results by rounded_journeys_millions in descending order and year in ascending order
ORDER BY rounded_journeys_millions DESC,year ASC
--Limit results to the first 5 rows
LIMIT 5;-- least_popular_years_tube
--Select year, journey_type and sum of all journeys for each year
SELECT year, journey_type, SUM(journeys_millions) AS total_journeys_millions
--Remember that the table name is case sensitive
FROM TFL.JOURNEYS
--Filter journey type for Underground & DLR
WHERE journey_type='Underground & DLR'
--Group by year and journey type
GROUP BY year,journey_type
--Order results by sum of all journeys for each year in ascending order
ORDER BY total_journeys_millions
--Limit results to 5
LIMIT 5;