Skip to content

Call or Email, Why Not Both?


Dear Head of Analytics,

Thank you for the opportunity to review this data and offer my opinion. I want to share the process of how I prepared the dataset for analysis, take you through my thought process of how I explored the data providing visualization, discuss the important business metrics that we can identify in the data, and provide my recommendation to strengthen the business in the future.

1. Data Validation


To set up the data for exploratoin, I started with cleaning the data. I first validated the type of every column to ensure appropriate types for each column.

I wanted to address the 'sales_method' column as so much of my analysis relies upon all data fitting properly into three distinct categories. I noticed variation of capitalization and spelling, so I corrected this column to only include 'Email', 'Call', and 'Email + Call'.

I then handled missing values, which only existed in the 'revenue column. Since the ''sales_method'' column was corrected, I was able to use aggregated functions grouped around this category. Missing revenue values (7.16%) were imputed using the median revenue per method to preserve valuable rows. I believe that in taking this approach, I was able to roughly approximate the revenue that corresponded with each respective sales method.

Finally, I addressed outliers in 'years_as_customer'', ensuring no values exceeded the company age of 40 years since the company started in 1984. I replace any value greater than 40 with the maximum of 40.

I reviewed the dataset otherwise for duplicate rows, or any other abnormal outliers in numeric values or strings of text, but did not find anything else that seemed of major concern, thus it was time to move on to the exploratory analysis of the data.

See some results printed from the data cleaning below for proof:

Hidden code

2. Exploratory Analysis:


To get a broad overview of the dataset, I first wanted to break up the data by 'sales_method'' into three categories to compare and looked at two totals.

First, I tallied the number of customers tied to each sales method:

Call Customers = 4962 (33.08%)
Email Customers = 7466 (49.77%)
Email + Call Customers = 2572 (17.15%).

Note: Given this uneven distribution, we need to consider averages as giving more insightful information than totals.

Looking across the entire dataset after imputing median revenue into the missing values, we get a total of $1433489.46 with the following breakdown by method:

Call Revenue = $236445.16 (16.49%)
Email Revenue =$724313.35 (50.52%)
Email + Call Revenue =$472730.95 (32.97%)

What is notable to me is that 'Email + Call' brings in significantly more total revenue proportional to the number of customers in the dataset. Here's a visual breakdown of the total spread of both count of customers per sales method, and the total revenue per sales method:

Hidden code

Comparing Revenue by Sales Method

If we start by looking generally at the spread of revenue across the database in a histogram, we see some larger spikes kind of weighted across in three different regions. When we group by sales method, this is when revenue distribution starts to show more clear correlation. If we take a look at some boxplots of revenue per sales method, we see much more noticeable differences per method, with revenue being lowest per customer for 'Call' only, roughly double on average that for 'Email', and almost doubling again for 'Email + Call'.

Hidden code
Hidden code

Before addressing revenue and sales method further, I wanted to explore other fields to see if there were any notable correlations to revenue, such as how many years a customer was a client, how many times they visited our website, or even the state the client lived in. For all these cases, I explored trends between these parameters and sales method as well as against revenue.

Analysis By Years as a Customer

A quick breakdown of looking at how many years our customers have been supporting our products, the vast majority of customers are relatively new, less than 5 years, as we can see from a histogram breakdown.

While exploring the relationship between years_as_customer and revenue, I analyzed both boxplots and scatterplots. The boxplots showed no significant differences in the number of years a customer has been with the business across the different sales methods. Similarly, the scatterplot showed no clear correlation between years_as_customer and revenue. However, the analysis did reinforce the overall pattern that sales method is the dominant factor influencing revenue, with no noticeable impact from customer tenure.

Key Insight: There is no clear evidence that the number of years as a customer has a strong influence on revenue generation. The main clustering of revenue appears to be based on sales method rather than customer tenure.

Hidden code
Hidden code

Analysis By Number of Site Visits

I took a similar approach when examining the relationship between 'nb_site_visits' and revenue. Taking a look at overall number of site visits, we see most of the dataset centering around 25 visits. The boxplots grouped by sales method revealed a slight increase in average site visits from Call to Email, to Email + Call, but the differences were minor. The scatterplot demonstrated the same clustered pattern by sales method as seen in previous analyses. As with 'years_as_customer', there is no strong correlation between the number of site visits and revenue, with sales method once again being the more impactful factor.

Key Insight: While 'nb_site_visits' slightly increases with more involved sales methods, it does not correlate with higher revenue on its own. The driving factor remains the sales method.

Hidden code
Hidden code