This dataset initially consists of 15000 rows and 8 columns. All columns were validated against the criteria outlined in the instruction table, with the following results:
- week: Data cleaning is not required. Meets all the specified requirements.
- sales_method: Data cleaning is needed. After adjusting the 'em + call' and 'email' to the correct format, the column consists of 3 distinct values ('Call', 'Email', 'Email + Call').
- customer_id: Data cleaning is not required. Meets all the specified requirements.
- nb_sold: Data cleaning is not required. Meets all the specified requirements.
- revenue: Data cleaning is needed. Removing the 'NA' values from the column.
- years_as_customer: Data cleaning is not required. Meets all the specified requirements.
- nb_site_visits: Data cleaning is not required. Meets all the specified requirements.
After undergoing the cleaning and validating process, the dataset only contains 13926 rows and 8 columns.
Business Goals
The primary business goal is to enhance sales effectiveness by identifying the best sales method, whether through Email, Call, or Email + Call, to successfully sell the new product line. This involves evaluating the efficiency and impact of each approach to determine which strategy generates the highest results with the optimal use of resources. By understanding the most effective technique, the company aims to maximize revenue, streamline its sales processes, and better address customer needs in a rapidly evolving market.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
product_sales = pd.read_csv('product_sales_cleaned.csv')
product_sales_count = product_sales.groupby("sales_method").size().reset_index(name="number_of_customers")
plt.figure(figsize=(8, 6))
sns.barplot(data=product_sales_count, x="sales_method", y="number_of_customers", palette="viridis")
plt.title("Number of Customers by Sales Method", fontsize=16, weight='bold')
plt.xlabel("Sales Method", fontsize=12, weight='bold')
plt.ylabel("Number of Customers", fontsize=12, weight='bold')
plt.xticks(fontsize=10)
plt.tight_layout()
plt.show()
The "Call" method had 4,781 customers (40%), the "Email" method reached 6,922 customers (58%), and the "Email + Call" method served 2,223 customers (19%). Even though fewer people are reached through "Email + Call," it delivered strong results. Let's break it down on the next chart.
plt.figure(figsize=(8, 6))
sns.boxplot(data=product_sales, y="revenue", palette="viridis")
plt.title("Overall Revenue Distribution", fontsize=16, weight='bold')
plt.ylabel("Revenue", fontsize=12)
plt.grid(axis='y', linestyle='-', alpha=0.7)
plt.tight_layout()
plt.show()
plt.figure(figsize=(8, 6))
sns.boxplot(data=product_sales, x="sales_method", y="revenue", palette="viridis")
plt.title("Revenue Distribution by Sales Method", fontsize=16, weight='bold')
plt.xlabel("Sales Method", fontsize=12, weight='bold')
plt.ylabel("Revenue", fontsize=12, weight='bold')
plt.xticks(fontsize=10)
plt.grid(axis='y', linestyle='-', alpha=0.7)
plt.tight_layout()
plt.show()
Overall, the majority of the revenue distribution lay between $52 to $110 per customer. The "Call" method brought in $48 in revenue per customer, the "Email" method earned $98 per customer, and the "Email + Call" method outperformed the rest with $179 per customer. Clearly, combining email and calls brings the best results.
plt.figure(figsize=(12, 6))
sns.lineplot(data=product_sales, x="week", y="revenue", hue="sales_method", palette="viridis", marker="o")
plt.title("Revenue Trends Over Time by Sales Method", fontsize=16, weight='bold')
plt.xlabel("Week", fontsize=12, weight='bold')
plt.ylabel("Revenue", fontsize=12, weight='bold')
plt.legend(title="Sales Method", fontsize=10)
plt.tight_layout()
plt.show()
Within 6 weeks, the "Call" method showed steady but slower growth, the "Email" method stayed consistent, and the "Email + Call" method had big spikes in revenue, proving it can deliver strong returns when needed most.
state_revenue = (
product_sales.groupby("state")["revenue"].sum().reset_index(name="total_revenue").sort_values(by="total_revenue", ascending=False))
top_3_state = state_revenue.head(3)
plt.figure(figsize=(8, 6))
sns.barplot(data=top_3_state, x="state", y="total_revenue", palette="viridis")
plt.title("Top 3 States by Revenue Contribution", fontsize=16, weight="bold")
plt.xlabel("State", fontsize=12, weight="bold")
plt.ylabel("Total Revenue", fontsize=12, weight="bold")
plt.xticks(fontsize=10)
plt.yticks(fontsize=10)
plt.grid(axis='y', linestyle='-', alpha=0.7)
plt.gca().set_axisbelow(True)
plt.tight_layout()
plt.show()
California contributed the most to total revenue, with an amount of $161,000, indicating a significant market potential. This was followed by Texas at $106,000 and New York at $83,000.
top_3_states_list = top_3_state["state"].tolist()
filtered_data = product_sales[product_sales["state"].isin(top_3_states_list)]
avg_revenue = (filtered_data.groupby(["state", "sales_method"])["revenue"].mean().reset_index())
avg_revenue["state"] = pd.Categorical(avg_revenue["state"], categories=["California", "Texas", "New York"], ordered=True)
plt.figure(figsize=(10, 6))
sns.barplot(data=avg_revenue, x="state", y="revenue", hue="sales_method", palette="viridis")
plt.title("Average Revenue by Sales Method in Top 3 States", fontsize=16, weight="bold")
plt.xlabel("State", fontsize=12, weight="bold")
plt.ylabel("Average Revenue", fontsize=12, weight="bold")
plt.xticks(fontsize=10)
plt.yticks(fontsize=10)
plt.grid(axis='y', linestyle='-', alpha=0.7)
plt.gca().set_axisbelow(True)
plt.tight_layout()
plt.show()
Overall, the "Email + Call" sales method is the most effective for generating revenue in all three states.
Business Metrics
To align with our business goal, tracking 'Revenue by Sales Method' is crucial to understanding the impact of each sales approach. Monthly monitoring of this metric helps identify trends and optimize sales strategies. The analysis shows that the "Email + Call" method is the most profitable approach, delivering the highest returns compared to other methods.
Conclusion
Based on the last 6 weeks of data, the revenue per customer for each method is:
- Call: $48
- Email: $98
- Email + Call: $179
The "Email + Call" method generates the highest revenue per customer, making it the most effective approach. Taking time and effort into account, it requires just 10 minutes of call time per customer, combined with a simple email, leading to an optimal balance of efficiency and results. Compared to the "Call" method, which needs 30 minutes per customer with lower returns, and the "Email" method, which requires no calls but has less revenue potential, "Email + Call" stands out as the best choice for both profitability and time efficiency.