Skip to content

Data Analyst Practical Exam Submission

Data Validation

The dataset contained 1500 rows and 8 columns before cleaning and validation. Each column was checked against the criteria provided in the dataset table:

  • week: 1,500 numeric values with no missing data. No cleaning was needed.
  • sales_method: 1,500 character values. There were inconsistencies in category names, which have been corrected.
  • customer_id: 1,500 unique character values, with no duplicates. No cleaning was needed.
  • nb_sold: 1,500 numeric values with no missing data. No cleaning was needed.
  • revenue: 13,926 numeric values, but 1,074 were missing. Since there were outliers, the missing values were replaced with revenue median value.
  • years_as_customer: 1,500 numeric values with no missing data. Since the company was founded in 1984 and the current year is 2025, there were two values exceeding 41 years. These incorrect values were dropped.
  • nb_site_visits: 1,500 numeric values with no missing data. No cleaning was needed.
  • state: 1,500 character values with no missing data. No cleaning was needed.

How do sales differ between different sales methods? Why is it important?

From the last 6 months' records, we can see that most of our customers have been new customers (less than 5 years). This highlights the importance of keeping them informed and updated about our company's activities so that we can convert them into loyal, long-lasting customers.

We can see that majority of our customer were from California, Texas, New York, and Florida.

As we can see from the plot below, the majority of the revenue from our sales was below 125. Therefore, there is significant room to improve the sales by focusing on the correct type of sales methods.

We have contacted our customers in three ways: email, call, and both email and call. As we can see from the plot below, the majority of our customer contact was through email, followed by calls, and finally through both methods.

What is the best approach to contact customers?

Based on the three plots below, although we contacted the fewest number of customers with email + call, this method generated the highest revenue, sold the most products, and received the highest number of site visits, followed by email and then calls, with a few outliers.

We can also see from the plot below that, during every period of time, the revenue generated by the email and call combination was the highest, followed by email and then call. We can observe that compared to the beginning, the revenue generated by the email plus call technique increased by 100 units, while for the other methods, this amount was lower. Even though calling customers is more expensive than emailing, the revenue and sales generated through calling alone were lower. It's important to note that when we contact customers using the email and call combination, the duration of the call is shorter than with the call-only method, as some information is transmitted via email.

Bussiness Metrics

Since our goal is to find the best approach to sell the new product effectively, I would recommend using the average revenue per customer for each sales method over the next 6 months as our key metric.

Based on the last 6 months, the average revenue per customer for each sales method is shown below. Therefore, if this number increases over the next 6 months, it would be a positive sign in achieving our goal.

Sale methodAverage revenue per customer
call47.60
email97.13
email + call183.65

Recommendation

For the following weeks, I would recommend we can focus on the following steps:

  1. Focus on the email and call combination: Since the email + call combination method generated the highest revenue, sold the most products, and attracted the highest number of site visits, it should be prioritized for future customer outreach.
  2. Reconsider using pure call method: Since the call-only method is more expensive than email and showed lower revenue and sales, it is better to reconsider using calls alone.
  3. Regional marketing focus: provide more resources to high-performing states while exploring strategies to improve sales in lower-revenue regions.
  4. Customer engagement analysis: Conduct further analysis to determine if frequent site visits correlate with higher purchases and design marketing strategies accordingly.