Skip to content
0

Where to open a new coffee shop?

πŸ“– Background

You are helping a client who owns coffee shops in Colorado. The company's coffee shops serve high-quality and responsibly sourced coffee, pastries, and sandwiches. They operate three locations in Fort Collins and want to expand into Denver.

Your client believes that the ideal location for a new store is close to affluent households, and the store appeals to the 20-35 year old demographic.

Your team collected geographical and demographic information about Denver's neighborhoods to assist the search. They also collected data for Starbucks stores in Denver. Starbucks and the new coffee shops do not compete for the same clients; the team included their location as a reference.

πŸ’Ύ The data

You have assembled information from three different sources (locations, neighborhoods, demographics):

Starbucks locations in Denver, Colorado
  • "StoreNumber" - Store Number as assigned by Starbucks
  • "Name" - Name identifier for the store
  • "PhoneNumber" - Phone number for the store
  • "Street 1, 2, and 3" - Address for the store
  • "PostalCode" - Zip code of the store
  • "Longitude, Latitude" - Coordinates of the store
Neighborhoods' geographical information
  • "NBHD_ID" - Neighborhood ID (matches the census information)
  • "NBHD_NAME" - Name of the statistical neighborhood
  • "Geometry" - Polygon that defines the neighborhood
Demographic information
  • "NBHD_ID" - Neighborhood ID (matches the geographical information)
  • "NBHD_NAME' - Nieghborhood name
  • "POPULATION_2010' - Population in 2010
  • "AGE_ " - Number of people in each age bracket (< 18, 18-34, 35-65, and > 65)
  • "NUM_HOUSEHOLDS" - Number of households in the neighborhood
  • "FAMILIES" - Number of families in the neighborhood
  • "NUM_HHLD_100K+" - Number of households with income above 100 thousand USD per year

Starbucks locations were scrapped from the Starbucks store locator webpage by Chris Meller.
Statistical Neighborhood information from the City of Denver Open Data Catalog, CC BY 3.0 license.
Census information from the United States Census Bureau. Publicly available information.

import pandas as pd
import geopandas as gpd
denver = pd.read_csv('./data/denver.csv')
denver
neighborhoods = gpd.read_file('./data/neighborhoods.shp')
neighborhoods
census = pd.read_csv('./data/census.csv')
census

πŸ’ͺ Challenge

Provide your client a list of neighborhoods in Denver where they should consider expanding. Include:

  • A visualization of Denver's neighborhoods and the Starbucks store locations.
  • Find the neighborhoods with the highest proportion of people in the target demographic.
  • Select the top three neighborhoods where your client should focus their search.

Note:

To ensure the best user experience, we currently discourage using Folium and Bokeh in Workspace notebooks.

πŸ§‘β€βš–οΈ Judging criteria

This is a community-based competition. The top 5 most upvoted entries will win.

The winners will receive DataCamp merchandise.

βœ… Checklist before publishing

  • Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
  • Remove redundant cells like the judging criteria, so the workbook is focused on your work.
  • Check that all the cells run without error.

βŒ›οΈ Time is ticking. Good luck!

Hello, I'm Brian, and I'm attempting this challenge to reinforce my learning and educate myself on aspects that are new to me in this challenge.

The challenge will be broken down into 6 steps.

  1. Analysing the dataframe individually
  2. Merging Neighbourhood and Census Data
  3. Doing a geographical mapping of the places using Plotly
  4. Doing a population mapping based on the merged data and the Starbucks Denver data
  5. Doing a heat map on the population that is more populated towards the age
  6. Suggesting optimal new Starbucks locations based on previous information

With that said, let us start to analyse the dataframe individually.

Analysis of the Dataframe

With the three dataframes divided, it is essential to document an analysis of each dataframe, not just for reiteration of information, but also to clarify personal views on the dataframes for better navigation later during the challenge.

Firstly, the Starbucks Denver data provides information on Starbucks locations within Denver itself. Mainly, this data is to identify Starbucks locations that are already established to compare with the geographical data from the Neighbourhood data. It includes:

  • StoreNumber: The Store ID from Starbucks. Mostly used for identifying and for Starbucks to link their own data, but it isn't very useful in this challenge
  • Name: Name of the store, similar to StoreNumber for identification
  • PhoneNumber: Phone Number of the store, also not useful for the data, and it is mainly for consumers or data analysts from Starbucks
  • Street1, 2 and 3: This is useful to check with the Neighbourhood data to ensure that the neighbourhood is classified correctly. It is more of a manual checking system rather than for analysing the data to analyse.
  • PostalCode: Postal Code of the Starbucks store. Not going to be very useful, but it could be a good factor to consider when considering the exact location of the Starbucks Store. However, it is good to note that although having an exact location of the Starbucks Store is useful, it will not be necessary, as we only need the rough location for clustering.
  • Longitude, Latitude: This will be the main driving factor to compare and cluster the data with the Neighbourhood data. If we can use this to set the stores' longitude and latitude as the cluster points, we can use this data to determine better locations for Starbucks.

Secondly, the Neighbourhood data provides information on the geographical information of the neighbourhoods in Denver. This is mainly used and merged with the Census data to provide more information on the geographical locations of the neighbourhood. It is also good to familiarise oneself with this data for analysts not living in Denver. It includes:

  • NBHD_ID: The neighbourhood ID of each neighbourhood. It is important that this information is necessary to merge or join with the Census data to link the two neighbourhoods in both datasets.
  • NBHD_NAME: The neighbourhood name. This is more for manual comparison and checking of the merging and clustering
  • Geometry: This is the main bulk of the data, and the main data of concern regarding this dataframe. It contains the geographical information, including the latitude and longitude of each neighbourhood. It is essential that we can export this data into the latitude and longitude of each neighbourhood so that we can use those two parameters to compare and cluster with the Starbucks Denver Data.

Thirdly, the Census data is good data to understand the population data of the neighbourhood. If linked with the Neighbourhood data, it will provide good information on which areas are populated with people that are in the age range of 20 - 35. Even so, having the population without the age can provide some information on recommendations for the new optimal Starbucks locations. It includes:

  • NBHD_ID: The neighbourhood ID of each neighbourhood. Similar to the Neighbourhood data, it is important to merge with that data.
  • NBHD_NAME: The neighbourhood name is similar to the Neighbourhood data.
  • POPULATION_2010: Population recorded in 2010. This is assumed to be the total population, and it will be useful to see the population in general. It is also good as a parameter to compare the percentages of age population vs the total population
  • AGE...: Population of Denver divided by age. In here, the data becomes important, as this is one of the main factors that determines the result of the analysis. Here we will mainly focus on the age bracket 18-34, but it is good to note the other age brackets as well.
  • NUM_HOUSEHOLDS: Number of households that are in the neighbourhood. This will not be a very big factor to consider in the challenge, but good to note.
  • FAMILIES: Number of families in the neighbourhood. This also will not be a very big factor to consider in the challenge, but good to note
  • NUM_HHLD_100K+: Number of households with income above 100 thousand USD per year. This will also not be a very big factor to consider in the challenge, but good to note.

EDA of the Dataframes

It is important to check the dataframes for any mistakes that can be edited. Fortunately, the data has very small rows, meaning it is simple to manually check the dataframes for any errors. Errors can include mistakes in the wrong data type, missing data, etc.

Let's go through the dataframes one by one.

For the Starbucks Denver data,

  1. It is obvious right from the start is the null values are given in the street name. This is no error, as it could be that the street has no additional information, other than the first street. This can be safely ignored.
  2. The postal code is written in different formats. Some of them are written in 5 digits while most are written in 9 digits. This is an actual normal scenario, where the ZIP + 4 code is applied to represent a certain sector or block. The hyphen is omitted in this case, but since the postal code is not vital, it is not necessary to do any coding.

For the Neighbourhood data, Upon converting the file from .shp to .csv, it seems that there is no problem with the values. There are no missing values or values that seem out of position. However, it will be useful in the data transformation later to transform the geometry data into longitude and latitude.

For the Census data, The only missing data is in the NUM_HHLD_100K+ data. As this data is not that important in this challenge, no data cleaning will be done. However, this is an important consideration to note.

Data Transformation To yield results, we first ask ourselves the type of questions to answer from the dataset. That is,

  1. What is the density of each population between neighbourhoods?
  2. What is the percentage of 18-34 year olds in each neighbourhood?
  3. Where are the Starbucks stores generally located?
  4. Which area has the fewest Starbucks stores?

With these questions answered, we can finally answer the last part,

  1. Which neighbourhoods are the best places to build optimal locations for Starbucks?

Merging of the Neighbourhood and Census Data

To get a good estimate and a good plot of the population density between the neighbourhoods, we can first merge the two datasets to map out the population according to the geometry

β€Œ
β€Œ
β€Œ