Skip to content
0

Where should a drinks company run promotions?

💪 Competition challenge

  1. Recommend 10 additional regions they should select for the promotion.
  2. Tell the story that supports your recommendations.

Plan:

  1. Introduction
  2. Data pre processing
  3. Approach 1: Data Analysis
  4. Approach 2: Supervised learning
  5. Approach 3: Unsupervised learning

Introduction:

In order to recommend other regions with the same buying habits as Saint Petersburg, I will apply three approaches, in the end, I will intersect the results of these three approaches to get the best 10 recommendations.

The data

The marketing team has sourced you with historical sales volumes per capita for several different drinks types.

  • "year" - year (1998-2016)
  • "region" - name of a federal subject of Russia. It could be oblast, republic, krai, autonomous okrug, federal city and a single autonomous oblast
  • "wine" - sale of wine in litres by year per capita
  • "beer" - sale of beer in litres by year per capita
  • "vodka" - sale of vodka in litres by year per capita
  • "champagne" - sale of champagne in litres by year per capita
  • "brandy" - sale of brandy in litres by year per capita

I will start by importing the important packages:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

The following list, will contain the recommended regions, I will keep adding regions to it till the final approach.

recommended = []

It is important to do a descriptive analysis in order to understand the data. First let us take a look at the data.

df = pd.read_csv(r'./data/russian_alcohol_consumption.csv')
df.head()
df.info()

It is obvious that we have multiple missing values, let's take a closer look.

df.isna().sum()