Skip to content
0

Reporting on sales data

Now let's now move on to the competition and challenge.

📖 Background

You work in the accounting department of a company that sells motorcycle parts. The company operates three warehouses in a large metropolitan area.

You’ve recently learned data manipulation and plotting, and suggest helping your colleague analyze past sales data. Your colleague wants to capture sales by payment method. She also needs to know the average unit price for each product line.

🔖 Data Summary

Throughout the 1000 sales we have analysed, we came to a conclusion that the location of the warehouse can dictate its performance. The Central Warehouse is outperforming in both quantity of units sold and total revenue generated when compared to the North Warehouse or West Warehouse. We have seen a good variety of units sold, but the most frequent sales are for Breaking Systems and Suspension & Traction parts with over 2300 units sold combined while also presenting the highest revenue received at over 140000$. One outlier that produced decent revenue at almost 38000 while having the lowest amount of units sold (627) is the Engine. Most of the engines were sold at the Central Warehouse (449) which presented itself to be one of the reason of its succes in total sales.

💾 The data

The sales data has the following fields:
  • "date" - The date, from June to August 2021.
  • "warehouse" - The company operates three warehouses: North, Central, and West.
  • "client_type" - There are two types of customers: Retail and Wholesale.
  • "product_line" - Type of products purchased.
  • "quantity" - How many items were purchased.
  • "unit_price" - Price per item sold.
  • "total" - Total sale = quantity * unit_price.
  • "payment" - How the client paid: Cash, Credit card, Transfer.

💪 Challenge

Create a report to answer your colleague's questions. Include:

  1. What are the total sales for each payment method?
  2. What is the average unit price for each product line?
  3. Create plots to visualize findings for questions 1 and 2.
  4. [Optional] Investigate further (e.g., average purchase value by client type, total purchase value by product line, etc.)
  5. Summarize your findings.

✅ Checklist before publishing

  • Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
  • Remove redundant cells like the introduction to data science notebooks, so the workbook is focused on your story.
  • Check that all the cells run without error.

⌛️ Time is ticking. Good luck!

# Import libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style('darkgrid')
sns.set_palette('pastel')
# Load and preview the data

sales = pd.read_csv('data/sales_data.csv')
sales.sample(15, random_state = 42)
# Check for null values

sales.isna().sum()
# View the datas summary statistics

print(sales.describe(),'\n', sales.describe(exclude = np.number))
# Check the data types

sales.dtypes

Observation

The dataset is clean and has the right types for its features. From the initial preview of the data, we have seen that the Central Warehouse made the most appearances in sales and the most frequent client type was Retail. The clients used their credit cards as their preffered payment method.

We will continue with the exploratory data analysis to find answers for the business questions.

Exploratory Data Analysis (EDA)

  1. What are the total sales for each payment method?
  2. What is the average unit price for each product line?
  3. Create plots to visualize findings for questions 1 and 2.
  4. [Optional] Investigate further (e.g., average purchase value by client type, total purchase value by product line, etc.)
  5. Summarize your findings.
# 1. What are the total sales for each payment method?

# Group the data by payment method and extract the total sum of sales
sales_by_payment_method = sales.groupby('payment')['total'].sum()
sales_by_payment_method.sort_values(ascending = False, inplace = True)

# Plot a barchart of total sales by payment method
fig = plt.figure(figsize = (10, 5))

ax = sns.barplot(x = sales_by_payment_method.index, y = sales_by_payment_method.values)

plt.title('Total sales by payment method', fontdict = {'fontweight': 'bold', 'fontsize': 15})
plt.xlabel('Payment Method')
plt.ylabel('Total Sales')
plt.bar_label(container = ax.containers[0], label_type = 'edge', fmt = '%1.0f')
plt.show()
# 2. What is the average unit price for each product line?

# Group the data by product line and extract average unit price

unitprice_by_productline = sales.groupby('product_line')['unit_price'].mean()
unitprice_by_productline

# Plot a barchart of average unit price by product line
fig = plt.figure(figsize = (12, 5))

ax = sns.barplot(x = unitprice_by_productline.index, y = unitprice_by_productline.values)

plt.title('Average unit price by product line', fontdict = {'fontweight': 'bold', 'fontsize': 15})
plt.xlabel('Product Line')
plt.ylabel('Average Unit Price')
plt.bar_label(container = ax.containers[0], fmt = '%1.2f')
plt.show()