Skip to content
0

Where should a drinks company run promotions?

📖 Background

The company owns a chain of stores across Russia that sell a variety of alcoholic drinks. The company recently ran a wine promotion in Saint Petersburg that was very successful. Due to the cost to the business, it isn’t possible to run the promotion in all regions. The marketing team would like to target 10 other regions that have similar buying habits to Saint Petersburg where they would expect the promotion to be similarly successful.

The data

The marketing team has sourced the analyst team with historical sales volumes per capita for several different drinks types.

  • "year" - year (1998-2016)
  • "region" - name of a federal subject of Russia. It could be oblast, republic, krai, autonomous okrug, federal city and a single autonomous oblast
  • "wine" - sale of wine in litres by year per capita
  • "beer" - sale of beer in litres by year per capita
  • "vodka" - sale of vodka in litres by year per capita
  • "champagne" - sale of champagne in litres by year per capita
  • "brandy" - sale of brandy in litres by year per capita
library(tidyverse)
library(skimr)
data <- readr::read_csv('./data/russian_alcohol_consumption.csv')
skim(data)

Data for Saint Petersburg

The request from the marketing team is to find regions similar to Saint Petersburg. Let's first see what the behaviour and trends in Saint Petersburg are like.

data_stp <- data %>% filter(region == "Saint Petersburg") %>% select(-region)
skim(data_stp)

No missing values! Cool. Let's plot (...the doom of our competitors on the Russian market).

data_stp_long <- data_stp %>% pivot_longer(!year, names_to = "alcohol_type", values_to = "amount")
plot_stp <- ggplot(data_stp_long, aes(year, amount)) + geom_line() + facet_wrap(~alcohol_type, scales="free_y")
plot_stp

This rudimentary graph shows us that, between 1998 and 2016, sales of beer and vodka dropped in Saint Petersburg quite dramatically, while sales of brandy, champagne and wine first rose and then held more or less steady. This is the behaviour we will be seeking in the other regions.

Data for entire Russia - Missing values

Before we move on: While skimming before, we noticed that there were some missing values (NAs) in the table for the entire Russia. We need to decide what to do with them.

Let's start with finding out which regions don't have data for wine consumption, since it's the most important alcohol to our company.

data_missing <- data %>% filter(is.na(wine))
skim(data_missing)
skim(filter(data, !is.na(wine)))
regions_missing <- levels(as.factor(data_missing$region))
regions_missing

We now see that

/1/ Taking out rows which have missing values in the "wine" column takes care of almost all of the other missing values in the entire table (save 3 missing datapoints in the "brandy" column);

/2/ All of these missing values we've taken out are in the four regions:

  • Chechen Republic

  • Republic of Ingushetia

  • Republic of Crimea

  • Sevastopol.

There can be several reasons for the lack of data. Chechnya and its neighbour Ingushetia are majority-Muslim (see Wikipedia for Chechnya and Ingushetia) and they both have long-standing Islamist separatist movements, which endanger also wanna-be alcohol drinkers (see e.g. this article from Reuters from 2011). Also, in particular in Chechnya, the country suffered from two wars in 1994-1996 and 1999-2000. I imagine gathering statistics on alcohol consumption wasn't a priority in the post-war rebuilding.

As for the "Republic of Crimea" (quotes fully intended) and Sevastopol, they are internationally recognised as Ukrainian territory, unlawfully annexed by Russia in 2014.

All this politics aside, the conclusion here is that these four regions will not be of primary interest to the company.

data_nonmissing <- data %>% filter(!(region %in% c("Chechen Republic", "Republic of Ingushetia", "Republic of Crimea", "Sevastopol")))
skim(data_nonmissing)

Happily, removing the four regions took care also of the three remaining missing values in the "brandy" variable.

Data for entire Russia - Hierarchical clustering

Now that we have no missing data, it's time to compare the trends in all the regions of Russia remaining in the table with the trends in Saint Petersburg. To do this, the easiest way is to create a matrix containing a single row of datapoints containing values per year and alcohol type for each region, and then hierarchically cluster them. Then we'll see which cluster Saint Petersburg belongs to. We may think of it as comparing the shapes of the lines on the graph of amount of alcohol vs. (year-alcohol type), each line for a different region.

However, before we do that, there is one more thing to do. Because the amounts of alcohol are vastly different depending on type (much more beer, in liters, is drunk than e.g. brandy), the distance-calculating algorithm will pay different amount of attention to these various alcohol types (larger to the larger numbers). To attenuate this, we can scale the data around 0:

data_scaled <- data_nonmissing %>% mutate(wine = scale(wine), beer = scale(beer), vodka = scale(vodka), champagne = scale(champagne), brandy = scale(brandy))
head(data_scaled)

In this scaled dataset, every datapoint, independent of the type of alcohol, weighs around the same.

However, at this point, we need to ask ourselves - is complete equality really what we want? Shouldn't wine, our focus, be more important than other alcohols? Going back to skimming the datasets, much more beer or vodka is drunk than champagne or brandy. Perhaps we should weigh these alcohol types slightly differently (if not exactly to the extent of the original disparity between the amount of beer and the other alcohols).

This is, perhaps, both the easiest and the hardest question of this exercise. Easiest, because it is not for the analyst to answer, there are no data to support this or that particular set of weights - the weights are a question of business priorities. And hardest, because we would like our decisions to have some rational ground.

For the purpose of this exercise, I have decided that beer and vodka would be given 1/2 the importance of wine, and champagne and vodka 1/10.