Skip to content
Case Study Office Supplies
Case Study Project - Office Supplies
Company Background
Pens & Printers is a national office supplies chain. At the moment, they send office supplies
out of warehouses in four regions: East, West, South, and Central, and all four warehouses
stock the same products.
The Head of Sales thinks this leads to large amounts of unsold products in some locations.
Customer Question
The management would like you to answer the following:
● Are there products that do not sell as well in some locations?
● Are there any other patterns over time in each region that you can find in the data
Dataset
Column name | Details |
---|---|
Order ID | Character. Unique identifier for the individual order. |
Order Date | Character. Date of the order, in format YYYY-MM-DD. |
Ship Mode | Character. The method used to send out the order. |
Region | Character. The region the order was sent from. |
Product ID | Character. Unique identifier of the product ordered. |
Category | Character. Category of the product, one of ‘Office Supplies’,‘Furniture’, or ‘Technology’. |
Sub-Category | Character. Subcategory of the product (e.g. Binders, Paper, etc.) |
Product Name | Character. The name of the product. |
Sales | Numeric. Total value of the products sold in the order. |
Quantity | Numeric. Quantity of the products in the order. |
Discount | Numeric. Discount of the order in decimal form. (e.g. 0.30 indicates the order has a 30% discount, etc.) |
Profit | Numeric. Profit of the order. |
Getting the data
# Load packages
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
%matplotlib inline
# Set the style to use for remaining plots
sns.set_style("dark")
# Read in the data
df_raw = pd.read_csv("office_supplies.csv", parse_dates=['Order Date'])
Inspecting the data
df_raw.shape
df_raw.describe()
df_raw['Order Date'].min()
df_raw['Order Date'].max()
df_raw.head(3)
Exploring data types
df_raw.info()
Exploring categorical features
def plot_cat_count(df, title):
fig, ax = plt.subplots(nrows=1, ncols=4, figsize=(15, 5))
fig.suptitle(title)
g1 = sns.countplot(data=df, y='Ship Mode', ax=ax[0], order=df['Ship Mode'].value_counts().index)
g1.set(title='Ship Mode', xlabel=None, ylabel=None)
g2 = sns.countplot(data=df, y='Region', ax=ax[1], order=df['Region'].value_counts().index)
g2.set(title='Region', xlabel=None, ylabel=None)
g3 = sns.countplot(data=df, y='Category', ax=ax[2], order=df['Category'].value_counts().index)
g3.set(title='Category', xlabel=None, ylabel=None)
g4 = sns.countplot(data=df, y='Sub-Category', ax=ax[3], order=df['Sub-Category'].value_counts().index)
g4.set(title='Sub Category', xlabel=None, ylabel=None)
plt.subplots_adjust(wspace=0.5)
plot_cat_count(df_raw, 'Counts of observations in categorical features')
df_columns = df_raw.columns
df_raw.columns