Skip to content
0

Motorcyle parts sales analysis

In this notebook, we will analyze the sales of the company across three warehouses in the metropolitan area. Besides answering the competition questions, we will apply a clustering algorithm that will cluster some columns per sales so we could understand how our cutomers behave, and which warehouses are bringing more sales.

Colleague's questions.

  1. What are the total sales for each payment method?
  2. What is the average unit price for each product line?
  3. Further investigation (e.g., average purchase value by client type, total purchase value by product line, etc.)
  4. Summary.

Following list is the plan of our notebook:

Plan

  1. Data Analysis.
  2. Data Visualization.
  3. Clustering.
  4. Conclusions.

We will start by importing important libraries that I will use in the first two parts.

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np

Let us take a look at our data:

df = pd.read_csv('data/sales_data.csv', parse_dates=['date'])
df.head()

Data Analysis

Sales per warehouse:

df.groupby('warehouse')[['total']].sum()
avg_units_client_type = df.groupby('client_type')['quantity'].mean()
avg_units_client_type.plot(kind='barh')
plt.show()

What are the total sales for each payment method?

df.groupby('payment')[['total']].sum().reset_index()