Case Study Office Supplies

Case Study Project - Office Supplies

Company Background

Pens & Printers is a national office supplies chain. At the moment, they send office supplies out of warehouses in four regions: East, West, South, and Central, and all four warehouses stock the same products.
The Head of Sales thinks this leads to large amounts of unsold products in some locations.

Customer Question

The management would like you to answer the following:
● Are there products that do not sell as well in some locations?
● Are there any other patterns over time in each region that you can find in the data

Dataset

Column name	Details
Order ID	Character. Unique identifier for the individual order.
Order Date	Character. Date of the order, in format YYYY-MM-DD.
Ship Mode	Character. The method used to send out the order.
Region	Character. The region the order was sent from.
Product ID	Character. Unique identifier of the product ordered.
Category	Character. Category of the product, one of ‘Office Supplies’,‘Furniture’, or ‘Technology’.
Sub-Category	Character. Subcategory of the product (e.g. Binders, Paper, etc.)
Product Name	Character. The name of the product.
Sales	Numeric. Total value of the products sold in the order.
Quantity	Numeric. Quantity of the products in the order.
Discount	Numeric. Discount of the order in decimal form. (e.g. 0.30 indicates the order has a 30% discount, etc.)
Profit	Numeric. Profit of the order.

Getting the data

# Load packages
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
%matplotlib inline 

# Set the style to use for remaining plots
sns.set_style("dark")

# Read in the data
df_raw = pd.read_csv("office_supplies.csv", parse_dates=['Order Date'])

Inspecting the data

df_raw.shape

df_raw.describe()

df_raw['Order Date'].min()

df_raw['Order Date'].max()

df_raw.head(3)

Exploring data types

df_raw.info()

Exploring categorical features

def plot_cat_count(df, title):    
    fig, ax = plt.subplots(nrows=1, ncols=4, figsize=(15, 5))
    fig.suptitle(title)

    g1 = sns.countplot(data=df, y='Ship Mode', ax=ax[0], order=df['Ship Mode'].value_counts().index)
    g1.set(title='Ship Mode', xlabel=None, ylabel=None)
 
    g2 = sns.countplot(data=df, y='Region', ax=ax[1], order=df['Region'].value_counts().index)
    g2.set(title='Region', xlabel=None, ylabel=None)

    g3 = sns.countplot(data=df, y='Category', ax=ax[2], order=df['Category'].value_counts().index)
    g3.set(title='Category', xlabel=None, ylabel=None)

    g4 = sns.countplot(data=df, y='Sub-Category', ax=ax[3], order=df['Sub-Category'].value_counts().index)
    g4.set(title='Sub Category', xlabel=None, ylabel=None)
    plt.subplots_adjust(wspace=0.5)

plot_cat_count(df_raw, 'Counts of observations in categorical features')

df_columns = df_raw.columns
df_raw.columns

‌
‌
‌