Where to open a new coffee shop?
๐ Background
You are helping a client who owns coffee shops in Colorado. The company's coffee shops serve high-quality and responsibly sourced coffee, pastries, and sandwiches. They operate three locations in Fort Collins and want to expand into Denver.
Your client believes that the ideal location for a new store is close to affluent households, and the store appeals to the 20-35 year old demographic.
Your team collected geographical and demographic information about Denver's neighborhoods to assist the search. They also collected data for Starbucks stores in Denver. Starbucks and the new coffee shops do not compete for the same clients; the team included their location as a reference.
๐พ The data
You have assembled information from three different sources (locations, neighborhoods, demographics):
Starbucks locations in Denver, Colorado
- "StoreNumber" - Store Number as assigned by Starbucks
- "Name" - Name identifier for the store
- "PhoneNumber" - Phone number for the store
- "Street 1, 2, and 3" - Address for the store
- "PostalCode" - Zip code of the store
- "Longitude, Latitude" - Coordinates of the store
Neighborhoods' geographical information
- "NBHD_ID" - Neighborhood ID (matches the census information)
- "NBHD_NAME" - Name of the statistical neighborhood
- "Geometry" - Polygon that defines the neighborhood
Demographic information
- "NBHD_ID" - Neighborhood ID (matches the geographical information)
- "NBHD_NAME' - Nieghborhood name
- "POPULATION_2010' - Population in 2010
- "AGE_ " - Number of people in each age bracket (< 18, 18-34, 35-65, and > 65)
- "NUM_HOUSEHOLDS" - Number of households in the neighborhood
- "FAMILIES" - Number of families in the neighborhood
- "NUM_HHLD_100K+" - Number of households with income above 100 thousand USD per year
Starbucks locations were scrapped from the Starbucks store locator webpage by Chris Meller.
Statistical Neighborhood information from the City of Denver Open Data Catalog, CC BY 3.0 license.
Census information from the United States Census Bureau. Publicly available information.
The Approach
The approach to this problem will be from the perspective that the solution lies in the neighborhoods closest to the intersection of our clients' requests:
- Neighborhoods with highest count of target population
- Neighborhoods with highest proportion of target population
- Neighborhoods with highest count of households with income higher than 100k
We will create three maps of the Denver neighborhoods for each criteria, then look for the highest ranking neighborhoods. We will attempt to find neighborhoods in the top 20 for all three.
To begin, we make sure we have the tables we want by creating any necessary merges and sorting by values of interest.
%%capture
pip install geopandasimport pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np
denver = pd.read_csv('./data/denver.csv')
denver# Convert denver dataframe into a shape file to plot starbucks locations
denver_gdf = gpd.GeoDataFrame(denver)
denver_gdf.set_geometry(gpd.points_from_xy(denver_gdf['Longitude'], denver_gdf['Latitude']), inplace=True, crs='EPSG:4326')
denver_gdf.drop(['Longitude', 'Latitude'], axis = 1, inplace=True)
denver_gdf.to_file('denver.shp')
denver_gdfneighborhoods = gpd.read_file('./data/neighborhoods.shp')
neighborhoods.sort_values('NBHD_ID')census = pd.read_csv('./data/census.csv')
census# Merge neighborhoods and census dataframes
merged = neighborhoods.merge(census, left_on='NBHD_ID', right_on='NBHD_ID')
merged.sort_values('AGE_18_TO_34', ascending=False)The Maps
We now create the maps to identify demographics and affluent households. We need a way to label the neighborhoods we will report back, so, given the limited real estate, we use the NBHD_ID column as our labels for now and call back on their names later.
# Plot the merged dataframe from before and use the target demographic column as the legend to identify where desirable
# neighborhoods are, then map starbucks locations on top of these
base = merged.plot(figsize=(40,12), column='AGE_18_TO_34', legend=True)
denver_gdf.plot(ax=base, marker='*', color='red', markersize=20)
merged.apply(lambda x: base.annotate(s=x.NBHD_ID, xy=x.geometry.centroid.coords[0], ha='center'), axis=1)
plt.title('Starbucks Locations in Denver Neighborhoods with Highest Count of Target Demographic', size=16)
target_dem_count = merged.sort_values('AGE_18_TO_34', ascending=False).head(20)
target_dem_count# Create target proportion column and sort in descending order
merged['TARGET_PROPORTION'] = merged['AGE_18_TO_34'] / merged['POPULATION_2010']base = merged.plot(figsize=(40,10), column='TARGET_PROPORTION', legend=True)
denver_gdf.plot(ax=base, marker='*', color='red', markersize=20)
merged.apply(lambda x: base.annotate(s=x.NBHD_ID, xy=x.geometry.centroid.coords[0], ha='center'), axis=1)
plt.title('Starbucks Locations in Denver Neighborhoods with Highest Proportion of Target Demographic', size=16)
target_dem_prop = merged.sort_values('TARGET_PROPORTION', ascending=False).head(20)
target_dem_propmerged['NUM_HHLD_100K+'] = merged['NUM_HHLD_100K+'].fillna(0)
base = merged.plot(figsize=(40, 12), column='NUM_HHLD_100K+', legend=True)
denver_gdf.plot(ax=base, marker="*", color='red', markersize=20)
merged.apply(lambda x: base.annotate(s=x.NBHD_ID, xy=x.geometry.centroid.coords[0], ha='center'), axis=1)
plt.title('Starbucks Locations in Denver Neighborhoods with Highest Count of Affluent Households', size=16)
affluent_households = merged.sort_values('NUM_HHLD_100K+', ascending=False).head(20)
affluent_householdsList of Target Neighborhoods
Now that we have our visuals, we can see there is some overlap. We will now take our sorted tables and find the top 20 neighborhoods in each one. Then, we will convert them to sets and find the intersections of these sets. This should reveal any neighborhoods that may be in all three sets.
โ
โ