Skip to content
0

Simple view of the targets distribution pinpoints 3 prommissing neighberhoods for new coffe shop locations

Introduction

For the purpose of oppening a new coffe shop, I was requested to identify three prommising neighberhoods based on the target clients age and household income.

The challenge lays on finding the locations where those target variables simultaneoulsy beat the other locations. Having this in mind, I calculated the proportions of the census variables that more closely illustrate the target clients age and household income. Specifically, the population between 18 and 34 years old and the households with income above 100k USD dollars. Please note that the age groups that I selected don't fully characterize the target clients age, because they miss the population with 35 years old and include younger people than the targets age. Furthermore, the next age group (between 35 and 65 years old) is very broad to be included. Thus, I assumed that the targets age is best characterized by the range 18 to 35 years old. Additionally, the Starbucks locations were used as reference, but did not impact my decision making, because they were not considered competitors.

With the former assumptions and notes in mind, my work is organized in (1) data processing; (2) Illustration of the neighberhoods target population and Starbucks locations; and (3) a simple view of the targets distribution.

Based on those steps, my suggestion of top 3 locations to consider are:

  • Central Park;
  • Hilltop;
  • South Park Hill.

Results

1 - Data processing

# Load necessary packages
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(sf))
suppressPackageStartupMessages(library(ggplot2))

# Load necessary data
denver <- readr::read_csv('data/denver.csv', show_col_types = FALSE)
neighborhoods <- st_read("data/neighborhoods.shp", quiet=TRUE)
census <- readr::read_csv('data/census.csv', show_col_types = FALSE)

In summary, I joinned the census data to the neighborhoods data, so that it is possible to view maps with the target variables. Then, I removed from the dataset the neighberhoods without households with more than 100K USD dollars income per year, because they are outside the target population. Finally, I calculated the percentage of population between 18 and 34 years old, as well as the percentage of households above 100K USD dollars income per year.

# Join neighborhoods data with the census data
neighborhoods_census <- neighborhoods %>% left_join(census)

# Clean data and calculate necessary variables
neighborhoods_census_clean <- neighborhoods_census %>%
	filter(!is.na('NUM_HHLD_100K+'))  %>%
	mutate(ProportionYoungPeople = AGE_LESS_18*100/POPULATION_2010,
           ProportionHousesAbove100K = `NUM_HHLD_100K+`*100/NUM_HOUSEHOLDS)

2 - Illustration of the neighberhoods target population and Starbucks locations

# Plot the map with proportion of target age group
neighborhoods_census_clean %>% ggplot()+
	geom_sf(aes(fill=ProportionYoungPeople)) +
	geom_point(data=denver,aes(x=Longitude,y=Latitude),shape=14)+
	scale_fill_gradient(low = "grey",high="blue")+
	labs(title="Percentage of the population between 18 and 34 years old
Points indicate Starbucks locations",fill="%")+
	theme_void()

# Plot the map with proportion of target househoolds
neighborhoods_census_clean %>% ggplot()+
	geom_sf(aes(fill=ProportionHousesAbove100K)) +
	geom_point(data=denver,aes(x=Longitude,y=Latitude))+
	scale_fill_gradient(low = "grey",high="blue")+
	labs(title="Percentage of households with income above 100 thousand USD per year 
Points indicate Starbucks locations",fill="%")+
	theme_void()

3 - Simple view of the targets distribution

In most neighberhoods, arround 20% of the population was within the range of 18 to 34 years old. Thus, it is reasonable to set the bar for the new coffe shop to be at neighberhoods with more than 20% of people in the 18 to 34 years old range.

# See distribution of proportion of young people
neighborhoods_census_clean %>% 
	ggplot(aes(ProportionYoungPeople))+
	geom_histogram(bins=10)+
	geom_vline(xintercept=20,color="blue",linetype="dashed")+
    labs(y = "Number of neighborhoods",
        x= "Percentage of the population between 18 and 34 years",
        title = "Distribution of neighborhoods with population between 18 and 34 years")+
	theme_bw()

Furthermore, few neighberhoods had more than 40% of househoolds with income above 100K USD dollars per year. Thus, we can set the bar for the new coffe shop to be at neighberhoods with more than 40% of the targets household income.

# See distribution of proportin of target households
neighborhoods_census_clean %>% 
	ggplot(aes(ProportionHousesAbove100K))+
	geom_histogram(bins=10)+
	geom_vline(xintercept=40,color="blue",linetype="dashed")+
    labs(y="Number of neighborhoods",
        x = "Percentage of households with income above 100 thousand USD per year",
         title = "Distribution of neighborhoods with income above 100 thousand USD per year")+
	theme_bw()

By plotting the percentage of both target variables and considering the two "bars" decided above, it is clear that Central Park, Hilltop and South Park Hill standout from the remaining neighberhoods.

# Add top 3 locations to the combined dataset
neighborhoods_census_clean <- 
neighborhoods_census_clean  %>% 
mutate(Top_3_locations = 
           ifelse(ProportionYoungPeople >= 20 & ProportionHousesAbove100K >=40,NBHD_NAME,"Not a target neighborhood"))

# Re-level factor variable
neighborhoods_census_clean$Top_3_locations <- neighborhoods_census_clean$Top_3_locations %>% 
	factor(levels = c("Central Park","Hilltop","South Park Hill","Not a target neighborhood"))

# Percentage of young people against percentage of households with target income
neighborhoods_census_clean  %>% 
	filter(!is.na(Top_3_locations)) %>% 
	ggplot(aes(ProportionYoungPeople,ProportionHousesAbove100K,col=Top_3_locations))+
	geom_point()+
	ylim(c(0,100))+
	xlim(c(0,100))+
	geom_hline(yintercept = 40,color="blue",linetype="dashed")+
	geom_vline(xintercept = 20,color="blue",linetype="dashed")+
    labs(y="Percentage of households with income above 100 thousand USD per year",
        x = "Percentage of the population between 18 and 34 years",
        color="Target neighberhoods")+
	theme_bw()

Conclusion

My suggestions are the Central Park, Hilltop and South Park Hill neighberhoods, because they have a higher proportion of people in the target age and they have a higher proportion of households in target household income than most other neighberhoods.