Competition - drinks promotions

Set-up

install.packages("plotly")
install.packages("patchwork")
install.packages("factoextra")


library(skimr)
library(tidyverse)
library(tidymodels)
library(plotly)
library(patchwork)


theme_set(theme_light())

Data

data <- 
  read_csv('./data/russian_alcohol_consumption.csv')

data %>% 
  skimr::skim_without_charts()

Knowing our data

Background

Your company owns a chain of stores across Russia that sell a variety of alcoholic drinks. The company recently ran a wine promotion in Saint Petersburg that was very successful. Due to the cost to the business, it isn't possible to run the promotion in all regions. The marketing team would like to target 10 other regions that have similar buying habits to Saint Petersburg where they would expect the promotion to be similarly successful.

Data dictionary

The marketing team has sourced you with historical sales volumes per capita for several different drinks types.

+-----------------+----------------------------------------------------------------------------------------------------------------------------------------+ | year | year (1998-2016) | +-----------------+----------------------------------------------------------------------------------------------------------------------------------------+ | region | Name of a federal subject of Russia. It could be oblast, republic, krai, autonomous okrug, federal city and a single autonomous oblast | +-----------------+----------------------------------------------------------------------------------------------------------------------------------------+ | wine | Sale of wine in litres by year per capita | +-----------------+----------------------------------------------------------------------------------------------------------------------------------------+ | beer | Sale of beer in litres by year per capita | +-----------------+----------------------------------------------------------------------------------------------------------------------------------------+ | vodka | Sale of vodka in litres by year per capita | +-----------------+----------------------------------------------------------------------------------------------------------------------------------------+ | champagne | Sale of champagne in litres by year per capita | +-----------------+----------------------------------------------------------------------------------------------------------------------------------------+ | brandy | Sale of brandy in litres by year per capita | +-----------------+----------------------------------------------------------------------------------------------------------------------------------------+

Objective/Goals

Recommend 10 additional regions they should select for the promotion.
Tell the story that supports your recommendations.

Notes

Region should be a factor
We need to pivot longer
Missing data across all numerical vars, but the complete rate looks good enough
We are looking for patterns of regions that behave more similar to Saint Petersburg. looks like a clustering task
There is ts data (historical sales for diferent regions) maybe time series models ?
Top regions with most "type" liters per cap sold
Do some summary and plot functions
do a clustering and maybe an interactive graph

Exploratory Data Analysis

data <- 
  data %>% 
  pivot_longer(-c(year,region),
               names_to = "type",
               values_to = "value")

Summaries of the data

data %>%
  na.omit() %>% 
  group_by(type) %>% 
  summarise(min = min(value),
            median = median(value),
            mean = mean(value),
            max = max(value),
            sd = sd(value),
            IQR = IQR(value))

data %>% 
  view()

data %>% 
  distinct(region) %>% 
  summarise(n = n())

data %>% 
  # filter(region == "Saint Petersburg") %>%
  group_by(type,region) %>% 
  summarise(mean = mean(value),
            median = median(value),
            min = min(value),
            max = max(value)) %>% 
  filter(type == "beer") %>% 
  arrange(desc(mean))
  

data %>% 
  filter(region == "Saint Petersburg") %>% 
  ggplot(aes(x = year, y = value))+
  geom_line()+
  geom_point()+
  facet_wrap(~type, scales = "free_y")+
  stat_smooth(method = "lm",formula = y ~ splines::bs(x, 3))

Whats happening in Saint Petersburg ?

Sells more beer than any other type of alchohol in average across all years
Beer presents a downward trend but the volume of sales its still more compared to the others
Vodka has a stepper downtrend
Wine and brandy possibly upper trend
Champange possibly downward trend

Look for regions with the most similar sells patterns (trends)

K-means clustering

data %>%
  mutate(year_type = paste(type,year),
         year_type = snakecase::to_snake_case(year_type)) %>% 
  select(-c(type,year)) %>%
  pivot_wider(names_from = year_type,
              values_from = value,
              values_fill = 0)

whats happening ? more columns than rows because there are 4 types of alcochol for 1998 to 2016, and dont know how to deal with it yet

For now lets summarise across years by the mean

clust_data <- 
  data %>% 
  filter(!is.na(value)) %>% 
  group_by(region,type) %>% 
  summarise(mean = mean(value)) %>%
  ungroup() %>% 
  pivot_wider(names_from = type,
              values_from = mean,
              values_fill = 0) %>%
  mutate(across(is.numeric,~ as.numeric(scale(.))))

k_clust <- kmeans(select(clust_data,-region),centers = 4)

tidy(k_clust)

glance(k_clust)

‌
‌
‌