Product Sales Data Analysis for Pens and Printers
Introduction
Context:
“Pens and Printers”, established in 1984, provides office products to large organizations. As consumer buying behavior evolves, the company must adapt its sales tactics to ensure effective product launches. Recently, a new line of creative office stationery was launched, and three sales methods were tested to find the best approach to drive sales:
- Email: Customers received two emails with product information (minimal effort from the sales team).
- Call: Customers were called directly by sales reps, with each call lasting around 30 minutes.
- Email + Call: Customers received an email followed by a 10-minute phone call.
Since launching the new product line six weeks ago, the sales team has collected data on sales performance. However, they did not provide exact time spent on the Email method beyond mentioning that it required "very little work." For the purpose of this analysis, I assumed that:
- The Email method took not more than 1 minute per customer (leading to a total of 7,465 minutes for the entire email campaign).
- The Email + Call method took a total of 11 minutes per customer (1 minute for email + 10 minutes for follow-up call).
The goal of this analysis is to provide the sales team with insights:
- How many customers were engaged through each approach?
- What does the revenue distribution look like overall, and how does it differ for each method?
- Were there any changes in revenue over time for each method?
- Which sales method should the team continue to use, balancing revenue outcomes and the effort required?
- Are there any other customer-specific insights that can inform future sales strategies?
Data validation
Categorical data:
- sales_method: Inconsistencies were corrected to ensure consistent groupings, and the values were standardized (removed leading/trailing spaces, uniformed case).
- state: The values in the column were standardized (removed leading/trailing spaces, uniformed case).
- customer_id: As a unique identifier, it didn’t require transformation, so no changes were made.
Numerical data:
- revenue: The column had significant missing values, which were imputed using the median revenue for each sales method, as it better represents the central tendency and is less affected by outliers. All values were formatted as floats with two decimal places.
- years_as_customer: These values were converted to integers to maintain data integrity and were validated by checking against the company’s founding year. Two invalid entries, where the number of years exceeded the company’s age, were removed.
- week: The week column, recording the number of weeks since product launch, was checked to ensure all values were between 1 and 6 and formatted as integers.
- nb_sold: The number of products sold was validated to confirm that all values were non-negative integers, ensuring valid product sales counts.
- nb_site_visits: The number of site visits was checked to ensure all values were non-negative integers, maintaining consistency.
Exploratory Analysis
1. Number of customers by sales method
The first question asked was how many customers were there for each approach. After cleaning the data, I grouped customers by the sales method used and found the following:
- Email: 7,465 customers
- Call: 4,961 customers
- Email + Call: 2,572 customers
The Email method had the highest number of customers. This tells us that this method had the broadest reach, likely because it is less time-consuming for the team and easily scalable. In contrast, the Call method reached fewer customers, which aligns with its more time-consuming nature. The Email + Call method had the smallest number of customers, indicating it was the least scalable of the three.
I used a bar chart as it provides a clear visual representation of customer counts across the three methods.
2. Revenue spread (overall and by method)
To understand how well each method performed in terms of revenue, I calculated the overall revenue distribution and the revenue spread for each sales method.
I used a histogram to visualize the distribution, it gives an overall view of the revenue spread.
Overall revenue findings:
- Revenue ranged from a minimum of USD32.54 to a maximum of USD238.32.
- The average revenue was USD95.57, with the median revenue at USD90.95.
Since revenue has a wide range and includes outliers, I chose the median as a better measure of central tendency. It provides a more realistic picture of what a "typical" transaction looks like.
I used a box plot as it effectively highlights the differences in the revenue spread between the methods.
Revenue by method findings:
- Email + Call had the highest typical (median) revenue at USD184.77, indicating that customers in this group were generating significantly more revenue per transaction compared to the other methods.
- Call had the lowest typical (median) revenue at USD49.07, suggesting that while it reached fewer customers, it also brought in less revenue per transaction.
3. Revenue over time by sales method
To explore how revenue evolved over time for each method, I calculated the average revenue for each week and each sales method, using the mean to capture the general trend. To provide further insight, I determined the percentage change in average revenue from the first to the last week, reflecting the relative growth within each method.
I visualized these trends with an unstacked area plot where each method’s revenue is shown with filled-in areas, allowing direct comparison of how each method’s revenue evolved over time. This visualization emphasizes the changes in revenue for each method independently.
The analysis revealed the following:
This shows that Call experienced the most notable percentage increase in revenue relative to its initial baseline, even though its average revenue remained the lowest. Email falls in the middle both in terms of percentage increase and average revenue, making it a steady performer throughout the six weeks. However, Email + Call consistently generated higher average revenue per customer across the six weeks, demonstrating significantly better performance in terms of total revenue generated per customer.
4. Recommended sales method
To determine the most effective and efficient sales strategy, I analyzed each method based on total revenue, number of customers, effort (in minutes per customer), and complimented it with my earlier findings of revenue spread across sales methods.
Revenue and effort comparison for each method:
Key insights:
- The Email method provides the best combination of the highest total revenue (USD724,215.56) and the greatest efficiency per minute of effort (USD97.01), making it the best option for large-scale campaigns with minimal time investment. It is highly efficient for high-volume, low-effort sales strategies.
- The Email + Call method produces the highest median revenue per customer (USD184.77). While this approach requires more time, it delivers higher per-customer revenue. It’s best suited for high-value customers who need more personalized attention.
- The Call method remains the least efficient overall in terms of revenue generation and time investment. Despite the percentage increase, it generated the lowest total revenue (USD236,394.69), revenue per minute of effort (USD1.59), and median revenue per customer (USD49.07), making it the least effective option.
5. Customer differences across groups
To offer a deeper understanding of what contributed to the success, I conducted a cohort analysis to examine how customer tenure correlates with revenue changes for each sales strategy. I used a heatmap with a logarithmic scale to enhance the visualization of total revenue distribution across different cohorts of customer tenure and sales methods.
Key insights:
- Newer customers drive most revenue: The 0-5 years cohort generates the highest total revenue, contributing USD827,000.36 across all three sales methods. This represents 57.7% of the total revenue generated across all methods and cohorts. Of this, the Email approach accounts for USD415,149.95, or 50.2% of the revenue from the 0-5 years cohort, making it the most effective strategy for engaging new customers.
- Effectiveness of Email method across all cohorts: The Email method consistently outperforms both Call and Email + Call across all customer cohorts, including long-term customers. This suggests that the Email strategy is highly effective regardless of customer tenure, even among loyal customers.
- Decline in revenue with customer longevity: There is a noticeable decline in revenue as customers remain with the company for longer periods, across all methods. After the 6-10 years cohort, the total revenue contribution decreases significantly, a trend that holds across all sales methods.
Definition of a metric for the sales team to monitor
To effectively track the performance of the sales strategies and ensure they align with business goals, I recommend monitoring the Revenue per Minute of Effort (RPME) as a core metric. This metric combines two crucial factors: total revenue generated and the time invested by the sales team, helping to identify which approach delivers the highest financial return for the least amount of effort. To calculate RPME, divide the total revenue generated by the total time spent on sales strategy to measure its efficiency.
From the analysis, the initial values of RPME for each method are:
- Email: USD97.01
- Email + Call: USD16.71
- Call: USD1.59
These values set the baseline for how efficiently each method is currently performing. The sales team can compare future RPME values to these initial metrics to assess whether certain methods are improving, staying consistent, or declining in efficiency over time. Continuous monitoring will enable the team to adjust strategies for optimal results.
Summary
The analysis identified that the Email method had the broadest reach, engaging the most customers (7,465) and generating the highest total revenue (USD724,215.56). It also proved to be the most efficient, delivering USD97.01 in revenue per minute of effort, making it ideal for large-scale, low-effort campaigns.
The Email + Call method produced the highest median revenue per customer (USD184.77), but required more time, making it better suited for high-value clients needing personalized interaction.
In contrast, the Call method was the least effective, with the least total revenue (USD236,394.69) and poor time efficiency (USD1.59 per minute of effort).
Newer customers (0-5 years) contributed the most revenue across all methods, particularly through the Email method, which consistently outperformed the other strategies across all customer cohorts. Revenue declined notably with longer-tenured customers across all methods.
Recommendations
Addressing the sales team’s concerns about time investment and campaign costs, I recommend continuing with the Email method. This method delivers the highest total revenue and the best efficiency, making it a highly impactful and cost-effective choice for most sales campaigns. It has also proven to be the most successful across all customer tenures, including loyal clients.
When dealing with high-value clients who require more personalized attention and a tailored approach, the Email + Call method can be employed selectively. It generates the highest median revenue per customer and is well-suited for building long-term relationships and boosting individual returns.
I suggest discontinuing the Call method, as it does not provide the effectiveness needed for a successful sales strategy. With minimal revenue generation and inefficient use of time, the Call method proves to be the least productive choice.
Since newer customers (0-5 years) generate the majority of revenue, future sales efforts should focus primarily on this group. This tenure cohort has shown significantly higher revenue potential across all sales methods, making it the key focus for sales campaigns.
Although customers with more than 5 years of tenure contribute less revenue, they still provide significant value. Re-engagement strategies, such as loyalty programs or personalized marketing, could help boost purchasing activity. The Email method remains the key tool for targeting these customers effectively.
To maintain alignment with business goals, I suggest monitoring Revenue per Minute of Effort (RPME) as the key metric. This metric combines both total revenue and time spent, providing the sales team with a clear measure of efficiency.