Skip to content

BUSINESS PROBLEM

About Pens and Printers

Pens and Printers was founded in 1984 and provides high quality office products to large organizations. We are a trusted provider of everything from pens and notebooks to desk chairs and monitors. We have built long lasting relationships with our customers and they trust us to provide them with the best products for them. As the way in which consumers buy products is changing, our sales tactics have to change too. Launching a new product line is expensive and we need to make sure we are using the best techniques to sell the new product effectively

New Product Sales Methods

Six weeks ago we launched a new line of office stationery. Despite the world becoming increasingly digital, there is still demand for notebooks, pens and sticky notes. Our focus has been on selling products to enable our customers to be more creative, focused on tools for brainstorming. We have tested three different sales strategies for this, targeted email and phone calls, as well as combining the two.

Email: Customers in this group received an email when the product line was launched, and a further email three weeks later. This required very little work for the team.

Call: Customers in this group were called by a member of the sales team. On average members of the team were on the phone for around thirty minutes per customer.

Email and call: Customers in this group were first sent the product information email, then called a week later by the sales team to talk about their needs and how this new product may support their work. The email required little work from the team, the call was around ten minutes per customer.

Goals

So sales team need to know about:

  • The number of customers for each approach.
  • The spread of the revenue look like overall and for each method
  • Difference in revenue over time for each of the methods
  • Recommendation of the sales method that could continue to use
#importing all modules
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from scipy import stats
# import the dataset
df = pd.read_csv('product_sales.csv')
df.head()
df.info()

Data Cleaning and Validation

1. Validate data value per coloumn

# Validate values in each column
for column in df.columns:
    unique_values = df[column].unique()
    print(f"{column}:", unique_values)

We've noticed a misspelling in the sales_method column, where 'em + call' should be corrected to 'Email + Call' and 'email' should be corrected to 'Email' and We've noticed there is missing value in revenue coloumn

# Replace missing spelling
df['sales_method'] = df['sales_method'].replace({'em + call': 'Email + Call', 'email': 'Email'})
df['sales_method'].unique()
# Validate number of unique values in each column
for column in df.columns:
    n_values = df[column].nunique()
    print(f"{column}:", n_values)
# Validate null values in each column
null_counts = df.isnull().sum()
print(null_counts)

We've noticed that 1074 missing value in revenue data. We need to insert the mode value of revenue per sales method (because revenue per sales method is one of discrete data)

#mode value
mode_value = df.groupby('sales_method')['revenue'].apply(lambda x: x.mode().iloc[0]).reset_index()
print(mode_value)
# Replace missing values in 'revenue' column with the mode of each 'sales_method' group
df['revenue'] = df['revenue'].fillna(df.groupby('sales_method')['revenue'].transform(lambda x: x.mode().iloc[0]))
df.isnull().sum()
df.head()
# Inspect the negative value on numeric value
df.describe()