D4GC 2022 - Introduction to Workspace (Solution)

Which locations have the longest commutes in Vancouver?

Now that we have the basics under our belt, we will dive in and perform a quick analysis of Vancouver commute data. We will use both 2016 census data and geographical data downloaded from Vancouver's Open Data Portal (license).

Our goal is to determine which areas in Vancouver residents report the longest commutes. We will start by importing some packages.

# Import some useful packages
import geopandas as gpd
import pandas as pd
import plotly.express as px

Now that we have our data, we can use the read_csv() function to read in the "vancouver_commutes" file in our file browser.

# Create a pandas DataFrame from the commute data
van_data = pd.read_csv("vancouver_commutes.csv", index_col="Area")

# Preview the data
van_data

Currently, the data is in a difficult format to visualize. Let's create a column that represents the percentage of each region with a commute of 45 minutes or over.

We will create a new column "Percent" by dividing the sum of the two final columns by the .sum() of all columns.

# Create the new column
van_data["Percent"] = (van_data["60 minutes and over"] + van_data["45 to 59 minutes"]) / van_data.sum(axis=1) * 100

# Review the data
van_data

Great! Now we can load in the geojson file that we'll use to map out our commute data. To do this, we use the read_file() function to return a GeoDataFrame from the "vancouver_areas.geojson" file.

We will also set the index using .set_index() to allow for easy merging of the data.

# Read in the geojson file 
van_boundaries = gpd.read_file("vancouver_areas.geojson")

# Set the index
van_boundaries.set_index("name", inplace=True)

# Preview the file
van_boundaries

We will use the .merge() method to combine our commute and location data.

# Merge the data
van_df = van_boundaries.merge(van_data, 
                              left_index=True, 
                              right_index=True)

# Preview the final DataFrame
van_df

Finally, we can visualize the data using a choropleth map in Plotly Express. We will make use of a number of parameters to ensure our plot is set up correctly:

The GeoDataFrame we are plotting.
geojson: the geometry we are using to construct the map.
locations: sets the location of the plot.
color_continuous_scale: the color scale of our areas.
fitbounds: constraints the plot to the locations.
color: a variable to color our regions by.
template: sets the aesthetics of the plot.
title: add a descriptive title.

We finally use .update_geos() to disable the ugly frame of the plot and then .show() to render our figure!

# Create the figure
fig = px.choropleth(van_df, 
                    geojson=van_df.geometry, 
                    locations=van_df.index,
                    color_continuous_scale=["white", "#8F2800"],
                    fitbounds="locations",
                    color="Percent",
                    template="plotly_white",
                    title="<b>The suburbs have the worst commutes</b><br><sup>Percentage of Vancouver population with commute over 45 minutes</sup>"
                   )

# Set the frame color to white
fig.update_geos(framecolor="white")

# Show the plot
fig.show()

D4GC 2022 - Introduction to Workspace (Solution)

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Which locations have the longest commutes in Vancouver?

Which locations have the longest commutes in Vancouver?