Where Should Our Next Coffee Shop in Denver Be?

Where to open a new coffee shop?

📖 Background

You are helping a client who owns coffee shops in Colorado. The company's coffee shops serve high-quality and responsibly sourced coffee, pastries, and sandwiches. They operate three locations in Fort Collins and want to expand into Denver.

Your client believes that the ideal location for a new store is close to affluent households, and the store appeals to the 20-35 year old demographic.

Your team collected geographical and demographic information about Denver's neighborhoods to assist the search. They also collected data for Starbucks stores in Denver. Starbucks and the new coffee shops do not compete for the same clients; the team included their location as a reference.

💾 The data

You have assembled information from three different sources (locations, neighborhoods, demographics):

Starbucks locations in Denver, Colorado

"StoreNumber" - Store Number as assigned by Starbucks
"Name" - Name identifier for the store
"PhoneNumber" - Phone number for the store
"Street 1, 2, and 3" - Address for the store
"PostalCode" - Zip code of the store
"Longitude, Latitude" - Coordinates of the store

Neighborhoods' geographical information

"NBHD_ID" - Neighborhood ID (matches the census information)
"NBHD_NAME" - Name of the statistical neighborhood
"Geometry" - Polygon that defines the neighborhood

Demographic information

"NBHD_ID" - Neighborhood ID (matches the geographical information)
"NBHD_NAME' - Nieghborhood name
"POPULATION_2010' - Population in 2010
"AGE_ " - Number of people in each age bracket (< 18, 18-34, 35-65, and > 65)
"NUM_HOUSEHOLDS" - Number of households in the neighborhood
"FAMILIES" - Number of families in the neighborhood
"NUM_HHLD_100K+" - Number of households with income above 100 thousand USD per year

Starbucks locations were scrapped from the Starbucks store locator webpage by Chris Meller.
Statistical Neighborhood information from the City of Denver Open Data Catalog, CC BY 3.0 license.
Census information from the United States Census Bureau. Publicly available information.

%%capture
pip install geopandas

%%capture
pip install rtree

import pandas as pd
import geopandas as gpd
import pandas as pd
import numpy as np
import geopandas as gpd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
denver = pd.read_csv('./data/denver.csv')
neighborhoods = gpd.read_file('./data/neighborhoods.shp')
census = pd.read_csv('./data/census.csv')

# Denver location dataset ----------------------------------------------
neighborhoods.crs = "EPSG:4326"
neighborhoods_3857 = neighborhoods.to_crs(epsg = 3857)
neighborhoods['area'] = neighborhoods_3857.geometry.area/10 ** 6

# Census missing data imputation
imp = IterativeImputer(max_iter=10, random_state=0)
census_imp = census.drop(columns = ['NBHD_ID', 'NBHD_NAME'])
census_imp = pd.DataFrame(imp.fit_transform(census_imp), columns = census_imp.columns)
census_imp.loc[census_imp['NUM_HHLD_100K+']<0, 'NUM_HHLD_100K+'] = 0
census_imp.loc[census_imp['NUM_HOUSEHOLDS'] <= census_imp['NUM_HHLD_100K+'], 'NUM_HHLD_100K+'] = 0
census_imp['NUM_HOUSEHOLDS'] <= census_imp['NUM_HHLD_100K+']

for col in census_imp.columns:
    census_imp[col] = census_imp[col].astype('int')
    
census_imputed = census[['NBHD_ID', 'NBHD_NAME']].join(census_imp)

Executive Summary

When choosing a new location to open up a store in Denver, Starbucks typically went for a neighborhood with the highest proportion of young adults and the least youngsters and senior citizens. However, if the company decided to go for other branches in the same neighborhood, it also considers more affluent areas with higher densities.
The neighborhoods with our desired characteristics are mostly centered around the business district and downtown. However there doesn't seem to be any neighborhood with consistenly high value in all characteristics. For introducing the optimal coffee spot, the neighborhoods were ranked based on their weighted average of our desired characteristics.

The final candidate neighborhoods are as follows:

North Capitol Hill: This neighborhood has the highest population density. The proportion of youngsters is only 3% and it has a high proportion of young adults (56%). This place has no Starbucks store and could be greatly appreciated by coffee lovers.
University: With 62% of young adults, this neighborhood could be a popular place to hang out for college students. It might not be as dense as the other two candidate locations, but the number of students could make up for this.
Spear: With 50% of young adults and 60% population density, this place can be a possible hotspot for caffeine fuel up. A Starbucks branch is also missing here.

Denver Demographic and Geographical Insights

In 2010, Denver was composed of 78 total neighborhoods. With its international airport at the northeast and its downtown at the center of the city, inclined to the North-west. The neighborhood with the highest proportion of young adults (18-34) is Auraria, in the city center, and the amount is the least in Wellshire, in the South-east. Capital Hill has the highest population density and intuitively, the airport area, has the least. Central Park which was previously known as Stapleton, has a proportion of people with income higher than 100 thousand dollars. This proportion is the least in Auraria which is dominated by universities and campus sites.

In the following maps, we can see more detailed information on Denver's proportion of young adults, the proportion of households with more than $100 thousand income, and population density in each neighborhood respectively. The region with the darker shade indicate a larger amount, these regions' names were annotated on the map.

For the rest of the analysis, Denver International Airport (DIA) location wasn't shown in full. Its wide area would have affected the whole map and we would know that wouldn't be our ideal choice for our store location.

# Data transformation ----------------------------------------------
census_neighbors = neighborhoods.merge(census_imputed, on='NBHD_NAME')
census_neighbors.drop(columns=['NBHD_ID_y'], inplace=True)
census_neighbors.rename(columns = {'NBHD_ID_x':'NBHD_ID'}, inplace=True)
census_neighbors['AGE_LESS_18_prop'] =census_neighbors['AGE_LESS_18'] / census_neighbors['POPULATION_2010']
census_neighbors['AGE_18_TO_34_prop'] =census_neighbors['AGE_18_TO_34'] / census_neighbors['POPULATION_2010']
census_neighbors['AGE_35_TO_65_prop'] =census_neighbors['AGE_35_TO_65'] / census_neighbors['POPULATION_2010']
census_neighbors['AGE_65_PLUS_prop'] =census_neighbors['AGE_65_PLUS'] / census_neighbors['POPULATION_2010']
#census_neighbors['NUM_HHLD_100K+'].fillna(0,inplace=True)
census_neighbors['NUM_HHLD_100K_prop'] = census_neighbors['NUM_HHLD_100K+'] / census_neighbors['NUM_HOUSEHOLDS']
census_neighbors['density'] = census_neighbors['POPULATION_2010'] / census_neighbors['area'] 

census_neighbors['density_scaled'] = \
    (census_neighbors['density'] - census_neighbors['density'].min())/\
    (census_neighbors['density'].max() - census_neighbors['density'].min())

# First map
fig, ax= plt.subplots(1,1, figsize=(13, 13))

plt.title('Map 1 - Choropleth of young adults (18-34) proportion\nin each neighborhood, Denver, Colorado', size=17, weight='heavy')
census_neighbors.plot(column='AGE_18_TO_34_prop', cmap='Spectral',
                      edgecolor='white',linewidth=2, ax=ax, alpha=0.8)

def plot_annotate(name, xy, xytext, color='black'):
    ax.annotate(text=name, xy=xy, xytext=xytext, size=16, arrowprops=dict(arrowstyle="->", color=color), 
                horizontalalignment='center', color=color)
    
plot_annotate('Auraria', (-105.01, 39.75), (-105, 39.73))
plot_annotate('University', (-104.965, 39.675), (-104.965, 39.65))
plot_annotate('CBD', (-104.995, 39.746), (-105, 39.755))
plot_annotate('Capitol Hill', (-104.98, 39.732), (-104.95, 39.732))
plot_annotate('North\n Capitol Hill', (-104.98, 39.744), (-104.97, 39.755))

ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)

plt.xlim([-105.1,-104.7])
plt.ylim([39.6,39.81])
plt.axis('off')
plt.tight_layout()

The neighborhoods with the most proportion of young adults are centered around the business district. Unsurprisingly, the University district is highly populated with young adults as well.

# Second map
fig, ax= plt.subplots(1,1, figsize=(13, 13))
plt.title('Map 2 - Choropleth of households with more than $100k income proportion', size=17, weight='heavy')
plt.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=0.4, hspace=None) 
census_neighbors.plot(column='NUM_HHLD_100K_prop', cmap='Spectral',
                      edgecolor='white',linewidth=2, ax=ax, alpha=0.8)


plot_annotate('Hilltop', (-104.927, 39.718), (-104.85, 39.718))
plot_annotate('Washington Park', (-104.966, 39.70), (-104.966, 39.645))
plot_annotate('Cherry Creek', (-104.95, 39.718), (-104.85, 39.68))
plot_annotate('Country Club', (-104.966, 39.721), (-104.966, 39.732))
plot_annotate('South Park Hill', (-104.922, 39.745), (-104.86, 39.74))

plt.text(s='Central\n Park', x=-104.893, y=39.773, size=14, color='white')

plt.xlim([-105.1,-104.7])
plt.ylim([39.6,39.81])

ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
plt.axis('off')
plt.tight_layout()

‌
‌
‌