Skip to content

Case Study Project - Office Supplies

Company Background

Pens & Printers is a national office supplies chain. At the moment, they send office supplies out of warehouses in four regions: East, West, South, and Central, and all four warehouses stock the same products.
The Head of Sales thinks this leads to large amounts of unsold products in some locations.

Customer Question

The management would like you to answer the following:
● Are there products that do not sell as well in some locations?
● Are there any other patterns over time in each region that you can find in the data

Dataset

Column nameDetails
Order IDCharacter. Unique identifier for the individual order.
Order DateCharacter. Date of the order, in format YYYY-MM-DD.
Ship ModeCharacter. The method used to send out the order.
RegionCharacter. The region the order was sent from.
Product IDCharacter. Unique identifier of the product ordered.
CategoryCharacter. Category of the product, one of ‘Office Supplies’,‘Furniture’, or ‘Technology’.
Sub-CategoryCharacter. Subcategory of the product (e.g. Binders, Paper, etc.)
Product NameCharacter. The name of the product.
SalesNumeric. Total value of the products sold in the order.
QuantityNumeric. Quantity of the products in the order.
DiscountNumeric. Discount of the order in decimal form. (e.g. 0.30 indicates the order has a 30% discount, etc.)
ProfitNumeric. Profit of the order.

Getting the data

# Load packages
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
%matplotlib inline 

# Set the style to use for remaining plots
sns.set_style("dark")

# Read in the data
df_raw = pd.read_csv("office_supplies.csv", parse_dates=['Order Date'])

Inspecting the data

df_raw.shape
df_raw.describe()
df_raw['Order Date'].min()
df_raw['Order Date'].max()
df_raw.head(3)

Exploring data types

df_raw.info()

Exploring categorical features

def plot_cat_count(df, title):    
    fig, ax = plt.subplots(nrows=1, ncols=4, figsize=(15, 5))
    fig.suptitle(title)

    g1 = sns.countplot(data=df, y='Ship Mode', ax=ax[0], order=df['Ship Mode'].value_counts().index)
    g1.set(title='Ship Mode', xlabel=None, ylabel=None)
 
    g2 = sns.countplot(data=df, y='Region', ax=ax[1], order=df['Region'].value_counts().index)
    g2.set(title='Region', xlabel=None, ylabel=None)

    g3 = sns.countplot(data=df, y='Category', ax=ax[2], order=df['Category'].value_counts().index)
    g3.set(title='Category', xlabel=None, ylabel=None)

    g4 = sns.countplot(data=df, y='Sub-Category', ax=ax[3], order=df['Sub-Category'].value_counts().index)
    g4.set(title='Sub Category', xlabel=None, ylabel=None)
    plt.subplots_adjust(wspace=0.5)
plot_cat_count(df_raw, 'Counts of observations in categorical features')
df_columns = df_raw.columns
df_raw.columns