Skip to content
Competition - drinks promotions
0
  • AI Chat
  • Code
  • Report
  • Set-up

    install.packages("plotly")
    install.packages("patchwork")
    install.packages("factoextra")
    
    
    library(skimr)
    library(tidyverse)
    library(tidymodels)
    library(plotly)
    library(patchwork)
    
    
    theme_set(theme_light())

    Data

    data <- 
      read_csv('./data/russian_alcohol_consumption.csv')
    
    data %>% 
      skimr::skim_without_charts()

    Knowing our data

    Background

    Your company owns a chain of stores across Russia that sell a variety of alcoholic drinks. The company recently ran a wine promotion in Saint Petersburg that was very successful. Due to the cost to the business, it isn't possible to run the promotion in all regions. The marketing team would like to target 10 other regions that have similar buying habits to Saint Petersburg where they would expect the promotion to be similarly successful.

    Data dictionary

    The marketing team has sourced you with historical sales volumes per capita for several different drinks types.

    +-----------------+----------------------------------------------------------------------------------------------------------------------------------------+ | year | year (1998-2016) | +-----------------+----------------------------------------------------------------------------------------------------------------------------------------+ | region | Name of a federal subject of Russia. It could be oblast, republic, krai, autonomous okrug, federal city and a single autonomous oblast | +-----------------+----------------------------------------------------------------------------------------------------------------------------------------+ | wine | Sale of wine in litres by year per capita | +-----------------+----------------------------------------------------------------------------------------------------------------------------------------+ | beer | Sale of beer in litres by year per capita | +-----------------+----------------------------------------------------------------------------------------------------------------------------------------+ | vodka | Sale of vodka in litres by year per capita | +-----------------+----------------------------------------------------------------------------------------------------------------------------------------+ | champagne | Sale of champagne in litres by year per capita | +-----------------+----------------------------------------------------------------------------------------------------------------------------------------+ | brandy | Sale of brandy in litres by year per capita | +-----------------+----------------------------------------------------------------------------------------------------------------------------------------+

    Objective/Goals

    1. Recommend 10 additional regions they should select for the promotion.
    2. Tell the story that supports your recommendations.

    Notes

    • Region should be a factor

    • We need to pivot longer

    • Missing data across all numerical vars, but the complete rate looks good enough

    • We are looking for patterns of regions that behave more similar to Saint Petersburg. looks like a clustering task

    • There is ts data (historical sales for diferent regions) maybe time series models ?

    • Top regions with most "type" liters per cap sold

    • Do some summary and plot functions

    • do a clustering and maybe an interactive graph

    Exploratory Data Analysis

    data <- 
      data %>% 
      pivot_longer(-c(year,region),
                   names_to = "type",
                   values_to = "value")

    Summaries of the data

    data %>%
      na.omit() %>% 
      group_by(type) %>% 
      summarise(min = min(value),
                median = median(value),
                mean = mean(value),
                max = max(value),
                sd = sd(value),
                IQR = IQR(value))
    data %>% 
      view()
    
    data %>% 
      distinct(region) %>% 
      summarise(n = n())
    
    data %>% 
      # filter(region == "Saint Petersburg") %>%
      group_by(type,region) %>% 
      summarise(mean = mean(value),
                median = median(value),
                min = min(value),
                max = max(value)) %>% 
      filter(type == "beer") %>% 
      arrange(desc(mean))
      
    
    data %>% 
      filter(region == "Saint Petersburg") %>% 
      ggplot(aes(x = year, y = value))+
      geom_line()+
      geom_point()+
      facet_wrap(~type, scales = "free_y")+
      stat_smooth(method = "lm",formula = y ~ splines::bs(x, 3))

    Whats happening in Saint Petersburg ?

    • Sells more beer than any other type of alchohol in average across all years
    • Beer presents a downward trend but the volume of sales its still more compared to the others
    • Vodka has a stepper downtrend
    • Wine and brandy possibly upper trend
    • Champange possibly downward trend

    Look for regions with the most similar sells patterns (trends)

    K-means clustering

    data %>%
      mutate(year_type = paste(type,year),
             year_type = snakecase::to_snake_case(year_type)) %>% 
      select(-c(type,year)) %>%
      pivot_wider(names_from = year_type,
                  values_from = value,
                  values_fill = 0)

    whats happening ? more columns than rows because there are 4 types of alcochol for 1998 to 2016, and dont know how to deal with it yet

    For now lets summarise across years by the mean

    clust_data <- 
      data %>% 
      filter(!is.na(value)) %>% 
      group_by(region,type) %>% 
      summarise(mean = mean(value)) %>%
      ungroup() %>% 
      pivot_wider(names_from = type,
                  values_from = mean,
                  values_fill = 0) %>%
      mutate(across(is.numeric,~ as.numeric(scale(.))))
    k_clust <- kmeans(select(clust_data,-region),centers = 4)
    
    tidy(k_clust)
    
    glance(k_clust)