Pens and Printers Report

Pens and Printers - A Report on Sales Analysis

Data

The sales rep has pulled some data from their sales tracking system for me. They haven’t included numbers for how much time was spent on each customer, but there may be some other useful customer information in here.

The data only relates to the new products sold. As there are multiple different products, the revenue will vary depending on which products were sold.

The data hasn’t been validated yet, so I'll make sure that to check it against all of the information in the table before I start the analysis. I'll provide a description of each processing and cleaning step.

Column Name	Details
week	Week sale was made, counted as weeks since product launch
sales_method	Character, which of the three sales methods were used for that customer
customer_id	Character, unique identifier for the customer
nb_sold	Numeric, number of new products sold
revenue	Numeric, revenue from the sales, rounded to 2 decimal places
years_as_customer	Numeric, number of years customer has been buying from us (company founded in 1984)
nb_site_visits	Numeric, number of times the customer has visited our website in the last 6 months
state	Character, location of the customer i.e. where orders are shipped

DataFrameas

df

variable

-- Explore the data in the table
SELECT *
FROM 'product_sales.csv'
LIMIT 5

Data Validation:

The data file has 15,000 rows and 8 columns (numeric and categorical data). Here are the cleaning steps taken for every column in the data:

After dropping the missing values from the revenue column, The Cleaned Dataset has 13,926 rows left.

import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns 


product_sales = pd.read_csv('product_sales.csv')


product_sales.head()

product_sales.info()

With the above summary, we can see that the revenue column has 13,926 values with 1,074 missing values.

Before dealing with missing values, first, let's check and correct column with str or character type.

print(product_sales.sales_method.unique())
print(' ')
print(product_sales.state.unique())
print(' ')

Loading and Cleaning Data accordingly

Column Name	Details
week	No change.
sales_method	Corrected. Misspelling such as `em + call` & `email` corrected to `Email + Call` & `Email`.
customer_id	No change.
nb_sold	No change.
revenue	Corrected. Rows with missing values were dropped.
years_as_customer	No change.
nb_site_visits	No change.
state	No change.

Now to change the method and the misrepresented values em + call, email. It should be Email + Call & Email.

product_sales.loc[product_sales.sales_method=='em + call', 'sales_method'] = 'Email + Call'
product_sales.loc[product_sales.sales_method=='email', 'sales_method'] = 'Email'

product_sales.sales_method.value_counts()

product_sales.isna().sum()

counts_method = product_sales.sales_method.value_counts(dropna=False)
na_revenue = product_sales.groupby('sales_method')['revenue'].count().reindex(counts_method.index)

na_counts = pd.concat([counts_method, na_revenue], axis=1).assign(NaNs=counts_method-na_revenue)
na_counts.assign(NaNs_pct=round(na_counts.NaNs/na_counts.sales_method, 2))

Since sales methods each has null values in the revenue column less than 15%, I'll simply remove those rows.

‌
‌
‌

Pens and Printers Report

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Pens and Printers - A Report on Sales Analysis

Data Validation:

Loading and Cleaning Data accordingly

Pens and Printers - A Report on Sales Analysis