Skip to content
Manifold Clustering
Motorcyle parts sales analysis
In this notebook, we will analyze the sales of the company across three warehouses in the metropolitan area. Besides answering the competition questions, we will apply a clustering algorithm that will cluster some columns per sales so we could understand how our cutomers behave, and which warehouses are bringing more sales.
Colleague's questions.
- What are the total sales for each payment method?
 - What is the average unit price for each product line?
 - Further investigation (e.g., average purchase value by client type, total purchase value by product line, etc.)
 - Summary.
 
Following list is the plan of our notebook:
Plan
- Data Analysis.
 - Data Visualization.
 - Clustering.
 - Conclusions.
 
We will start by importing important libraries that I will use in the first two parts.
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as npLet us take a look at our data:
df = pd.read_csv('data/sales_data.csv', parse_dates=['date'])
df.head()Data Analysis
Sales per warehouse:
df.groupby('warehouse')[['total']].sum()avg_units_client_type = df.groupby('client_type')['quantity'].mean()
avg_units_client_type.plot(kind='barh')
plt.show()What are the total sales for each payment method?
df.groupby('payment')[['total']].sum().reset_index()