Where to open a new coffee shop?
๐ Background
You are helping a client who owns coffee shops in Colorado. The company's coffee shops serve high-quality and responsibly sourced coffee, pastries, and sandwiches. They operate three locations in Fort Collins and want to expand into Denver.
Your client believes that the ideal location for a new store is close to affluent households, and the store appeals to the 20-35 year old demographic.
Your team collected geographical and demographic information about Denver's neighborhoods to assist the search. They also collected data for Starbucks stores in Denver. Starbucks and the new coffee shops do not compete for the same clients; the team included their location as a reference.
๐พ The data
You have assembled information from three different sources (locations, neighborhoods, demographics):
Starbucks locations in Denver, Colorado
- "StoreNumber" - Store Number as assigned by Starbucks
- "Name" - Name identifier for the store
- "PhoneNumber" - Phone number for the store
- "Street 1, 2, and 3" - Address for the store
- "PostalCode" - Zip code of the store
- "Longitude, Latitude" - Coordinates of the store
Neighborhoods' geographical information
- "NBHD_ID" - Neighborhood ID (matches the census information)
- "NBHD_NAME" - Name of the statistical neighborhood
- "Geometry" - Polygon that defines the neighborhood
Demographic information
- "NBHD_ID" - Neighborhood ID (matches the geographical information)
- "NBHD_NAME' - Nieghborhood name
- "POPULATION_2010' - Population in 2010
- "AGE_ " - Number of people in each age bracket (< 18, 18-34, 35-65, and > 65)
- "NUM_HOUSEHOLDS" - Number of households in the neighborhood
- "FAMILIES" - Number of families in the neighborhood
- "NUM_HHLD_100K+" - Number of households with income above 100 thousand USD per year
Starbucks locations were scrapped from the Starbucks store locator webpage by Chris Meller.
Statistical Neighborhood information from the City of Denver Open Data Catalog, CC BY 3.0 license.
Census information from the United States Census Bureau. Publicly available information.
1. Let's import the needed libraries
%%capture
pip install geopandas
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
2. First let's have a look at the data
2.1 The Denver CSV file contains all the locations of Starbucks coffees in Denver. To make this data more understandable and preview what information is included we will read it into a Pandas DataFrame and rename the DataFrame to denver_starbucks.
denver_starbucks = pd.read_csv('./data/denver.csv')
denver_starbucks.head()
2.2 As the Neighborhoods geographical data is stored in a shape format we will use the Geopanda library to read it into a GeoDataFrame that we will call 'neighborhoods'.
neighborhoods = gpd.read_file('./data/neighborhoods.shp')
print(neighborhoods.shape)
neighborhoods.head()
The neighborhoods GeoDataFrame is composed of only three columns and 78 rows, therefore we can know that Denver is composed of 78 different neighborhoods.
2.3 Lastly the census data gives us a good knowledge of the demographic of the different neighborhoods, with the most important information being the age groups as well as the number of Households that have an average yearly income of more than $100K.
census = pd.read_csv('./data/census.csv')
census
3. First Understandings & Data Cleaning
For the rest of the exercise we will assume that all the data available and the presented situation takes place in 2010, as our Data is based on the 2010 Census. Moreover, we can see that most of the needed information comes from the census Data where we have indications of the target audience for the new coffee shops:
- 20- 35 Age Group
- Affluent Households.
We can approximate the target age group of the client (20-35) to the age bracket 18-34. Moreover, we will define "affluent households" as households that have a yearly income of more than $100K.
Therefore, we will:
- Drop the neighborhoods that don't have any households that have an yearly income above $100K.
- Create a new column 'ratio_18_34' that calculates the ratio of 18-34 yearl olds in the neighborhood.
- Create a new column 'ratio_affluent_hhld' that calculates the ratio of 'affluent' households in the neighborhood.
In addition, we will also drop the columns that won't be necessary to the rest of the exercise.
- For the denver_starbucks DataFrame we will keep the 'Name', 'Longitude', and 'Latitude' columns.
- For the neighborhoods GeoDataFrame we will keep all three columns as they will serve us later to create our maps.
- For the census DataFrame we will keep the 'NBHD_ID', 'NBHD_NAME', 'ratio_18_34', 'ratio_affluent_hhld'.
โ
โ