Skip to content

The Assignment

Data, Business Goal & Business Problem, Contents

Business Goal

Increase revenue on the new office stationery line by identifying and leveraging the most effective sales methods and customer segments from the initial six weeks of product launch.

Business Problem

We lack a data-driven understanding of which sales methods and customer segments have been most profitable during the initial six weeks of our new product launch.

Outline of Contents

  1. Data Validation: Overview and Steps
  • Overview
  • Steps to Validate
  • Additional Findings
  1. Exploratory Analysis: Big Picture - Customers and Revenue by Sales Method
  • Number of Customers by Sales Method
  • Revenue Distribution by Sales Method
  • Total Revenue by Sales Method
  1. Exploratory Analysis: Customer Segments - Week, Years as Customer, Website Visits, State
  • Revenue vs Week Since Product Launch
  • Revenue vs Years as Customer
  • Revenue vs Website Visits
  • Revenue vs State
  1. Exploratory Analysis: Gaps in Our Data - Missing Revenue and Missing Fields
  • Missing Revenue - What are we missing out on?
  • Missing Fields - Additional data that will help us dive deeper.
  1. New Metric: Average Revenue per Week
  • Proxy for Sales Cycle
  • Current Data - chart with baseline insights and limitations
  • Moving Forward - new data fields and true sales cycle
  1. Summary and Recommendations
  • Summary of Key Findings
  • Business Recommendations
  • Future Analysis

1. Data Validation: Overview and Steps

Step 1: Overview of the Data

The data was generally found as described in the information table provided, with several exceptions as described in Step 2 below.

The following were applied to all fields in the product_sales dataframe:

  1. Examined the data to see the number of rows and columns of the dataframe, how many missing and populated values for each field, data type for each field, and summary data on the numerical fields.
  2. Displayed the count of unique values as well as the unique values themselves for each field.

Step 2: Steps to Validate - Findings & Cleaning by Field

  • week - There are 0 missing values and 6 unique populated values from 1-6. No further action required.

  • sales_method - There are 0 missing values and 5 unique populated values out of an expected 3 unique populated values:

    • unique values: 'Call', 'Email', 'Email + Call', 'em + call', 'email'
    • expected values: 'Call', 'Email', 'Email + Call'
    • CLEANING REQUIRED: Replace the 'em + call' and 'email' values with 'Email + Call' and 'Email' respectively.
    • Cleaning is completed and confirmed as there are now only the 3 expected values in this field.
  • customer_id - There are 0 missing values and 15,000 unique populated values. No further action required.

  • nb_sold - There are 0 missing values and 10 unique populated values ranging from 7-16. No further action required.

  • revenue - There are 1,074 missing values and 6,744 unique populated values ranging from 32.54 to 238.32.

    • CLEANING REQUIRED: Examine missing revenue records vs populated revenue records to see if there are any patterns and provide a rationale for how we will handle the missing values. The data was split into these two groups and then evaluated by sales method groupings (Call, Email, Email + Call) as follows:
      • number sold vs sales method - EXAMINE FURTHER. Email campaign is the leading sales method for both, but the following contrast exists: missing revenue records have roughly double the sales count from the Email + Call campaign vs the Call campaign, while populated revenue records show the opposite (half the sales count from the Email + Call campaign vs the Call campaign). This contrast carries through each of the following visualizations which were examined.
      • number sold vs week (grouped by sales methods) - EXAMINE FURTHER. The trends are similar, but the missing revenue records show a more pronounced uptick in sales count from weeks 4-6 than on the populated revenue records.
      • number sold vs years as customer (grouped by sales methods) - No further examination necessary. The trendlines are similar with significantly higher sales to customer years 1-3, followed by a steep decline leveling out to near 0 around years customer years 15-20.
      • number sold vs state - No further examination necessary. The top states are the top states for both groups.
      • number sold vs number of website visits - No further examination necessary. Both plots follow a similar bell curve, with peak in sales count at or right around 25.
    • CLEANING CONCLUSIONS: There are several noteworthy differences between the missing revenue records and populated revenue records. There could be various factors why we see these differences, and these may be too complex to fully understand. Therefore, interpolated substitution of missing revenue values is not prudent and we will instead handle these groups separately as follows.
      • populated revenue records - Proceed with the Exploratory Data Analysis on only this group. There is a significantly large enough sample (over 13,000 records) from which to make valid observations.
      • missing revenue records - See Step 3: Additional Findings below
  • years_as_customer - There are 0 missing values and 42 unique populated values ranging from 0 to 63.

    • CLEANING REQUIRED: The company is only 41 years old (1984-2025), so the values equal greater than 41, which are 47 and 63 and occur once each, must be changed to 41.
    • Cleaning is completed and confirmed as the incorrect values have been changed to 41.
  • nb_site_visits - There are 0 missing values and 27 unique populated values ranging from 12 to 41. No further action required.

  • state - There are 0 missing values and 50 unique populated values, comprising the 50 states and all spelled correctly. No further action required.

Step 3: Additional Findings

  1. Missing revenue records are not random. My findings indicate a clear pattern. A disproportionately high percentage of missing revenue records comes from the Email + Call and Email sales methods, suggesting a potential systematic issue with how revenue is being logged for these methods.
  • Call: 4962 total records (181 are missing revenue = 3.65% of total)
  • Email: 7466 total records (544 are missing revenue = 7.29% of total)
  • Email + Call: 2572 total records (349 are missing revenue = 13.57% of total)
  1. Insights and Next Steps:
  • Further Data Validation
    • Attempt to gather revenue for the missing revenue records.
    • Investigate the data entry process to understand why revenue is not being recorded consistently, and propose solutions to ensure all future records have this information.
  • Performance Analysis
    • Examine the talk tracks and sales scripts for the Call and Email + Call methods. Since the Call method has a significantly lower rate of missing revenue, we can look for differences that may be contributing to this discrepancy.

2. Exploratory Analysis: Big Picture - Customers and Revenue by Sales Method

Number of Customers by Sales Method

Because Data Validation revealed 1,074 records with revenue missing and there appears to be a pattern to which records are missing revenue, please find below three pie charts showing customers by sales method for:

  1. Overall number of customers
  2. Number of customers for records where revenue field is populated
  3. Number of customers for records where revenue field is missing input

Note that in the Overall and Revenue Populated charts the sales method proportions are roughly the same with approximately 1/2 the customers reached via Email, 1/3 reached via Call, and the remaining 1/6 reached via Email + Call. The Revenue Missing chart also has approximately 1/2 the customers reached via Email but the other two proportions are flipped with approximately 1/3 reached via Email + Call and 1/6 reached via Call. This is important for two reasons:

First, visualizing which records the analysis will cover moving forward (Revenue Populated).

Second, understanding the significant difference between the Revenue Missing vs Revenue Populated records which, based on other findings of revenue by sales method, points to what is likely a significant amount of unaccounted for revenue in the Email and Email + Call categories.

Revenue Distribution by Sales Method

Considering the range of revenue per sale by sales method, we can see that Call was the lowest, then Email was roughly double Call, and Email + Call was roughly almost double Email. This visualization is important to understand the amount of revenue we can expect to draw per sale for each sales method.

NOTE: MOVING FORWARD, ALL VISUALIZATIONS AND DATA COMMENTARY WILL BE DRAWN FROM REVENUE POPULATED RECORDS ONLY, UNLESS OTHERWISE NOTED. For reasons why, see commentary above under section 1. Data Validation

Total Revenue by Sales Method

As we see in the Revenue Distribution box plot, the revenue generated by the Call method is lowest by a significant margin. Email + Call and Email generated substantially more revenue, and Email was the most profitable of the three.

Although further analysis will reveal ways to hone sales focus by customer segments, it is already clear that Email and Email + Call are the most effective methods for selling our new product line.

3. Exploratory Analysis: Customer Segments - Week, Years as Customer, Website Visits, State

Revenue vs Week Since Product Launch

Combined Revenue (All sales methods)

There are two ways to look at this:

  1. Cumulative Revenue (how fast we bring in the revenue)
  • 50%+ of revenue occurred in the first 3 weeks
  • 75%+ of revenue occurred in the first 5 weeks
  1. Highest Weeks (which weeks have the best sales)
  • 58% of revenue occurred in weeks 1 (20.9%), 5 (19.5%), and 4 (18.0%) combined
  • 73% of revenue occurred in weeks 1, 5, 4, and 2 (15.1%) combined

The prevailing pattern of highest to lowest revenue producing sales methods generally held from weeks 1-4 with Email highest, followed by Email + Call, and Call the lowest (although Call was slightly higher than Email + Call in weeks 1-2). In weeks 5-6, Email + Call overtook Email (and Call was slightly higher than Email in week 6).

Comparing Sales Methods (Email, Email + Call, Call)

Two noticeable trends emerged that will inform Recommendations below:

First, Email's revenue comparison to the next highest sales method:

  • Weeks 1-3 - Email was roughly 9x, 3x, and 2x the second highest method by week.
  • Week 4 - Email still brought in around 33% more than the second highest method.
  • Weeks 5-6 - Email was no longer top and fell steadily.

Second, Email + Call revenue comparison to other sales methods:

  • Weeks 1-4 - Email + Call steadily climbed and held as second highest method across these weeks combined.
  • Weeks 5-6 - Email + Call was top at roughly 67% more and then over 4x more than the second highest method in successive weeks.

Note: Because Call was generally the lowest revenue producing sales method and fluctuated little from week to week, we could assume that the Combined Revenue of All sales methods would follow a similar week-to-week pattern whether or not Call is included in future campaigns.

Revenue vs Years as Customer

Revenue was highest among the newest customers as 52% of all revenue came from customers from 0-4 years with us and 78% of all revenue came from customers in their first 8 years with us. The prevailing pattern of highest to lowest revenue producing sales methods held throughout with Email highest, followed by Email + Call, and Call the lowest.

Revenue vs Website Visits

There is a clear correlation between number of website visits and revenue, a smooth bell curve showing maximum revenue sum at 26 website visits and 88% of the revenue coming from those who visited the website between 20-30 times.

This insight highlights an opportunity for implementing automation in our sales process, which will be discussed below in the Recommendations section.

The prevailing pattern of highest to lowest revenue producing sales methods held up to 28 website visits with Email highest, followed by Email + Call, and Call the lowest. At 29 website visits and more, Email + Call and Email switched positions.

Revenue vs State

The following state performance scatter plot depicts all states, ranked from 1 as highest to 50 as lowest on both total revenue and average revenue per sale. The greater the size of the data point and the darker the shading, the higher the total revenue.

There does not appear to be a clear correlation between average revenue per sale and total revenue. At a glance, it appears that there is a correlation between state population and total revenue, but verification of this is beyond the scope of this project.

Several insights can be gained from this visualization, which may guide more targeted sales and marketing campaigns:

  1. Most room for improvement (upper right - low average and low total revenue)
  2. Total revenue growth, highest potential (upper left - high average but low total revenue)
  3. Increased upsells, highest potential (lower right - low average but high total revenue).

The total known revenue is $ 1,308,138.01.

The top 10 states by total revenue comprised over 50% of all known revenue. In order from 1 to 10, these were California, Texas, New York, Florida, Illinois, Pennsylvania, Ohio, Michigan, Georgia, and North Carolina.

4. Exploratory Analysis: Gaps in Our Data - Missing Revenue & Missing Fields

Missing Revenue - What are we missing out on?

It is estimated that we are missing approximately $128,000 of revenue from our missing revenue records based on the heatmap below and the calculations that went into creating it as described herein.

The greatest missing revenue is estimated to be:

  • Email + Call in weeks 6 and 5 (up to 15% increase to populated values)
  • Email in week 1
  • Email + Call in week 4 and Email in week 2

Week 6 missing revenue could represent up to a 15% increase over revenue reported for the same week in populated revenue records.

Realizing patterns in sales method, number sold, and week since product launch, these were my steps to creating the heatmap:

  1. From populated revenue records, I created a pivot table of sales method by week since product launch to generate a multiplier equal to average revenue per number sold ($):
  • Email + Call = 15.00 +/- 0.02 to 0.32
  • Email = 10.00 +/- 0.01 to 0.05
  • Call = 5.00 +/- 0.01 to 0.05
  1. From missing revenue records, I created a similar pivot table as in the previous step with sales method by week since product launch. The values in the following heatmap are achieved by multiplying the average revenue per number sold (see previous step) by the number of sales in the corresponding groupings from the missing revenue records.

Heatmap revenue low to high follows the color gradient from warm & light to cool & dark.

Missing Fields - Additional data that will help us dive deeper.

Two key metrics are presently not possible to calculate given the data available. A simple addition of a new set of records and two new fields for every record would enable us to track the following:

Conversion rate could be calculated with the addition of records for prospects who were contacted but did not make a purchase.

Generating all the other data found in the current records (except number sold and revenue) for these new records will enable us to not only obtain our conversion rate but also compare and understand which customer segment sales and marketing campaigns are the most efficient.

Sales cycle could be measured accurately with new data fields for first outreach date & time and purchase date & time.

In addition, these new fields will enable us to fully assess the efficiency of our sales methods. Currently we do not have enough information to know when first contact was made or when in the sales method cadence any of the sales were made.

5. New Metric: Average Revenue per Week per Sale

Proxy for Sales Cycle

Average Revenue per Week per Sale has been created as a proxy for sales cycle in the absence of the raw data needed to calculate sales cycle. It functions as a time-dependent measurement of the efficiency of each sale and sales method.

Current Data - chart with baseline insights and limitations

This new metric has been applied to all revenue populated records, dividing revenue by week since product launch.

Addressing our interest in the relative effectiveness of our different sales methods, the following bar chart depicts baseline figures that reinforce our other findings indicating that Email and Email + Call have proven significantly more effective sales methods than Call.

The exact average revenue per week per sale ($) is:

  • All = 40.94
  • Email = 54.52
  • Email + Call = 49.38
  • Call = 17.35

Our limitation to week without day or time of day specificity means that the true difference between Email and Email + Call may be insignificant. However, the approximately 3x difference in revenue between the Email and Email + Call methods versus Call, along with other data, clearly demonstrates that Email and Email + Call are the superior sales methods.

Moving Forward - new data fields and true sales cycle

When we begin recording day and time of first outreach and closed sales, we will have the raw data necessary to calculate true sales cycle.

Average Revenue per Week Per Sale will create a like-to-like comparison of the sales methods' effectiveness between this original data and the new records we add.

Assuming sales cycle will be measured in days, at that point we will convert the numbers on this initial data to average revenue per day per sale by dividing the above numbers by 7 (days in a week).

Moving forward, an adjusted metric of Average Revenue per Day per Sale will have two benefits:

  1. Tracking the overall effectiveness of sales methods (or sales campaigns to different customer segments)
  2. Live measure of effectiveness for the sales team, where each team member will track the efficiency of their sales performance in comparison to baseline metrics established from this and future campaigns

6. Summary and Recommendations

Based on our analysis of sales methods and customer segments from the first six weeks of the product launch, opportunities for increasing revenue on the new office stationery line are promising. Our data shows clear patterns of success that can guide our future sales strategy.

Summary of Key Findings

  • The Email and Email + Call methods consistently outperformed the Call method, yielding significantly more total revenue and revenue per sale.
  • These top-performing methods are also the most time-efficient for our sales team, with the Email method generating substantial early-stage revenue and the Email + Call method driving stronger results in later weeks.
  • We have identified key customer segments that are most receptive to our product line, including prospects who have been customers for 0-8 years and those who have visited our website 20-30 times.

Business Recommendations

The following recommendations are designed to capitalize on these findings, improve our sales methods, and ensure we are tracking the right metrics for future success.

Simple Data Improvements to Drive Metrics

  1. Gather records for non-buying prospects to track conversion rate.
  2. Add two fields to our records: first outreach day & time and closed sale day & time to track sales cycle and refine our new metric to Average Revenue per Day per Sale.

Sales Method Adjustments to Drive Sales

  1. Abandon the Call sales method and only call after an email has been sent.
  2. Merge the Email and Email + Call sales methods based on the following data-informed reasoning:
  • Email method: the first email was more effective than the second email and the email in the Email + Call method.
  • Email + Call method: The call following the email produced both a steady revenue increase in later weeks and a higher average revenue than the Email method alone.
  • We should examine the emails and calls to identify the most effective templates and scripts in order to refine our combined approach.
  1. Implement an automated trigger system to engage customers who have visited the website 20 to 30 times. This system should notify a sales representative with key prospect information to facilitate timely outreach.

Future Analysis

While our initial analysis provides a strong foundation, the following customer segments warrant further investigation to uncover additional opportunities for revenue growth and resource allocation.

  • Website Visitors: A deeper dive into the behavior of customers who visit the website more than 20 times could reveal more granular insights into their customer journey.
  • Top-Performing States:
    • Analyze the top 10 states by total revenue to identify any shared characteristics or market trends that could be replicated in other states
    • Analyze sales performance by regions to identify any shared characteristics or market trends.
    • Design targeted campaigns for the states with highest potential for both total revenue growth and increased upsells.