Finding the Optimal Coffee Shop Location

Where to open a new coffee shop?

📖 Background

You are helping a client who owns coffee shops in Colorado. The company's coffee shops serve high-quality and responsibly sourced coffee, pastries, and sandwiches. They operate three locations in Fort Collins and want to expand into Denver.

Your client believes that the ideal location for a new store is close to affluent households, and the store appeals to the 20-35 year old demographic.

Your team collected geographical and demographic information about Denver's neighborhoods to assist the search. They also collected data for Starbucks stores in Denver. Starbucks and the new coffee shops do not compete for the same clients; the team included their location as a reference.

💾 The data

You have assembled information from three different sources (locations, neighborhoods, demographics):

Starbucks locations in Denver, Colorado

"StoreNumber" - Store Number as assigned by Starbucks
"Name" - Name identifier for the store
"PhoneNumber" - Phone number for the store
"Street 1, 2, and 3" - Address for the store
"PostalCode" - Zip code of the store
"Longitude, Latitude" - Coordinates of the store

Neighborhoods' geographical information

"NBHD_ID" - Neighborhood ID (matches the census information)
"NBHD_NAME" - Name of the statistical neighborhood
"Geometry" - Polygon that defines the neighborhood

Demographic information

"NBHD_ID" - Neighborhood ID (matches the geographical information)
"NBHD_NAME' - Nieghborhood name
"POPULATION_2010' - Population in 2010
"AGE_ " - Number of people in each age bracket (< 18, 18-34, 35-65, and > 65)
"NUM_HOUSEHOLDS" - Number of households in the neighborhood
"FAMILIES" - Number of families in the neighborhood
"NUM_HHLD_100K+" - Number of households with income above 100 thousand USD per year

Starbucks locations were scrapped from the Starbucks store locator webpage by Chris Meller.
Statistical Neighborhood information from the City of Denver Open Data Catalog, CC BY 3.0 license.
Census information from the United States Census Bureau. Publicly available information.

💪 Challenge

Provide your client a list of neighborhoods in Denver where they should consider expanding. Include:

A visualization of Denver's neighborhoods and the Starbucks store locations.
Find the neighborhoods with the highest proportion of people in the target demographic.
Select the top three neighborhoods where your client should focus their search.

Data Collection and Preparation

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import geopandas as gpd
import plotly.express as px
import plotly.graph_objects as go
from shapely.geometry import mapping, Point
import folium

sns.set_style('whitegrid')

denver_data = pd.read_csv('./data/denver.csv')
census_data = pd.read_csv('./data/census.csv')
neighborhoods = gpd.read_file('./data/neighborhoods.shp')

neighborhoods['center'] = neighborhoods.geometry.centroid
part = ['NBHD_NAME', 'center']
neighborhood_centers = neighborhoods[part]
neighborhood_one = neighborhoods.loc[neighborhoods.NBHD_NAME == "Auraria"]
center_point = neighborhood_one.center[0]
neighborhood_center = [center_point.y, center_point.x]

Check Values

# Check for missing values in census data
census_data.isnull().sum()

# Check for missing values in Denver Starbucks data
denver_data.isnull().sum()

# Convert necessary columns to appropriate data types
census_data['POPULATION_2010'] = census_data['POPULATION_2010'].astype(float)
census_data['AGE_LESS_18'] = census_data['AGE_LESS_18'].astype(float)
census_data['AGE_18_TO_34'] = census_data['AGE_18_TO_34'].astype(float)
census_data['AGE_35_TO_65'] = census_data['AGE_35_TO_65'].astype(float)
census_data['AGE_65_PLUS'] = census_data['AGE_65_PLUS'].astype(float)
census_data['NUM_HOUSEHOLDS'] = census_data['NUM_HOUSEHOLDS'].astype(float)
census_data['FAMILIES'] = census_data['FAMILIES'].astype(float)
census_data['NUM_HHLD_100K+'] = census_data['NUM_HHLD_100K+'].astype(float)

Null values

census_data.isnull().sum()

# Assume null instances of NUM_HHLD_100K+ == 0
median_val = census_data['NUM_HHLD_100K+'].median()

census_data['NUM_HHLD_100K+'] = census_data['NUM_HHLD_100K+'].fillna(median_val)

EDA

Distribution

# Set up the matplotlib figure
plt.figure(figsize=(15, 4))

# Histogram for households earning above 100K
plt.subplot(1, 2, 1)
sns.histplot(census_data['NUM_HHLD_100K+'], bins=20, kde=True)
plt.title('Distribution of Households Earning Above 100K')
plt.xlabel('Number of Households')
plt.ylabel('Frequency')

# Histogram for population aged 18-34
plt.subplot(1, 2, 2)
sns.histplot(census_data['AGE_18_TO_34'], bins=20, kde=True)
plt.title('Distribution of Population Aged 18-34')
plt.xlabel('Number of People')
plt.ylabel('Frequency')

# Show plots
plt.tight_layout()
plt.show()

‌
‌
‌