Data Analyst Professional Practical Exam Submission
You can use any tool that you want to do your analysis and create visualizations. Use this template to write up your summary for submission.
You can use any markdown formatting you wish. If you are not familiar with Markdown, read the Markdown Guide before you start.
Data Validation
The dataset originally contained 15,000 rows and 8 columns before cleaning and validation. I have validated all the columns against the criteria outlined in the dataset table:
- week: The dataset includes 6 recorded weeks (1, 2, 3, 4, 5, 6) with no missing values. All values are confirmed to be non-negative, so no cleaning was required.
- sales_method: Initially, there were 5 unique values. After cleaning, only 3 unique values remain (
Call
,Email
,Email + Call
), with no missing values, aligning with the provided description. - customer_id: Uniqueness was verified, and values were converted to string format. There are no missing values, consistent with the provided description, so no cleaning was needed.
- nb_sold: All values were confirmed as non-negative. Outliers were analyzed and retained if valid, with no missing values. No additional cleaning was necessary.
- revenue: Verified as numeric and non-negative. Extreme values were analyzed for validity, and no additional cleaning was needed.
- years_as_customer: Checked for unrealistic values, particularly those exceeding the company's age since 1984. Values greater than 41 years were removed. There were no missing values.
- nb_site_visits: Verified as non-negative integers. High visit counts were analyzed for customer behavior insights.
- state: Text formatting was standardized, and entries were checked for validity.
Final Dataset Summary:
After data validation and cleaning, the dataset now contains 14,998 rows and 8 columns, with no missing values.
Number of Customers for Each Approach :
- Email: 7465 customers
- Call: 4961 customers
- Email and call: 2572 customers
Overall Revenue Spread
The revenue distribution shows a clear distinction among the three sales methods.
There are some extreme values (outliers) in each method, particularly in Email + Call, indicating higher-value sales in some cases.
Revenue Per Sales Method
- Call: Has the lowest revenue spread, with most values concentrated around a lower revenue range (~$47.60 average).
- Email: Shows a moderate spread with a wider range of revenue values, but still relatively controlled (~$97.13 average).
- Email + Call: Has the widest revenue distribution, with a significantly higher median and many transactions generating revenues above the other two methods (~$183.65 average).
Outliers Identified
- There are several high-revenue transactions in Email + Call, likely from high-value customers.
- Some lower-end revenue values in all methods indicate that not all transactions are equally profitable.
Revenue Trends Over Time
- Email started with the highest revenue in the first week but experienced a consistent downward trend over time.
- Email + Call showed steady revenue growth, particularly from week 3 to week 5, surpassing Email in later weeks.
- Call consistently had the lowest revenue, with minimal fluctuations, indicating that it is the least effective method for sustained revenue generation.
Business Metric Definition
Recommended Metric: Customer Acquisition Efficiency
The recommended metric is Customer Acquisition Efficiency, which is the ratio of total revenue generated to the number of customer interactions per sales method.
Estimated Values:
- Email + Call: $158.73 per customer
- Email: $90.04 per customer
- Call: $45.86 per customer
Monitoring Strategy:
- Track weekly revenue and number of transactions.
- Adjust resource allocation based on performance trends.
Recommendation
Best Sales Method
- Email + Call demonstrated the highest revenue per transaction and significant growth in later weeks, making it the most effective method for maximizing revenue.
Alternative Approach
- Email initially had strong revenue but showed a declining trend over time. It remains a valuable method but may require adjustments to sustain long-term performance.
Least Effective Method
- Call has the lowest revenue return and showed minimal growth, indicating that it should be minimized or integrated with email campaigns for better results.
Action Plan
- Increase investment in combined email and call strategies, particularly for high-value customers.
- Reassess email-only campaigns to identify reasons for revenue decline and implement engagement improvements.
- Reduce standalone call efforts unless proven necessary for specific segments.