Skip to content
Project: Exploring London's Travel Network
  • AI Chat
  • Code
  • Report
  • London, or as the Romans called it "Londonium"! Home to over 8.5 million residents who speak over 300 languages. While the City of London is a little over one square mile (hence its nickname "The Square Mile"), Greater London has grown to encompass 32 boroughs spanning a total area of 606 square miles!

    Given the city's roads were originally designed for horse and cart, this area and population growth has required the development of an efficient public transport system! Since the year 2000, this has been through the local government body called Transport for London, or TfL, which is managed by the London Mayor's office. Their remit covers the London Underground, Overground, Docklands Light Railway (DLR), buses, trams, river services (clipper and Emirates Airline cable car), roads, and even taxis.

    The Mayor of London's office make their data available to the public here. In this project, you will work with a slightly modified version of a dataset containing information about public transport journey volume by transport type.

    The data has been loaded into an AWS Redshift database called tfl with a single table called journeys, including the following data:

    tfl.journeys

    ColumnDefinitionData type
    monthMonth in number format, e.g., 1 equals JanuaryINTEGER
    yearYearINTEGER
    daysNumber of days in the given monthINTEGER
    report_dateDate that the data was reportedDATE
    journey_typeMethod of transport usedVARCHAR
    journeys_millionsMillions of journeys, measured in decimalsFLOAT

    You will execute SQL queries to answer three questions, as listed in the instructions.

    Unknown integration
    DataFrameavailable as
    most_popular_transport_types
    variable
    -- most_popular_transport_types
    select
    	journey_type,
    	sum(journeys_millions) as total_journeys_millions
    from tfl.journeys
    group by
    	journey_type
    order by
    	total_journeys_millions desc
    
    
    Unknown integration
    DataFrameavailable as
    emirates_airline_popularity
    variable
    -- emirates_airline_popularity
    with non_null_emirates_rides as (
    	select
    		month,
    		year,
    		round(journeys_millions,2) as rounded_journeys_millions
    	from tfl.journeys
    	where
    		journey_type = 'Emirates Airline' and
    		journeys_millions is not null
    )
    
    select *
    from non_null_emirates_rides
    order by rounded_journeys_millions desc
    limit 5
    
    Unknown integration
    DataFrameavailable as
    least_popular_years_tube
    variable
    -- least_popular_years_tube
    -- emirates_airline_popularity
    with underground_years as (
    	select
    	 	year,
    		journey_type,
    		sum(journeys_millions) as total_journeys_millions
    	from tfl.journeys
    	where
    		journey_type = 'Underground & DLR'
    	group by
    		year,
    		journey_type
    )
    
    select *
    from underground_years
    order by
    	total_journeys_millions asc
    limit 5