Data Analyst Professional Practical Exam Submission
You can use any tool that you want to do your analysis and create visualizations. Use this template to write up your summary for submission.
You can use any markdown formatting you wish. If you are not familiar with Markdown, read the Markdown Guide before you start.
📝 Task List
Your written report should include written text summaries and graphics of the following:
- Data validation:
- Describe validation and cleaning steps for every column in the data
- Exploratory Analysis:
- Include two different graphics showing single variables only to demonstrate the characteristics of data
- Include at least one graphic showing two or more variables to represent the relationship between features
- Describe your findings
- Definition of a metric for the business to monitor
- How should the business use the metric to monitor the business problem
- Can you estimate initial value(s) for the metric based on the current data
- Final summary including recommendations that the business should undertake
Start writing report here..
1 Introduction
# library imports
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
%matplotlib inlineWe recently launched a new line of office stationery designed to foster creativity and enhance brainstorming capabilities. To determine the most effective sales approach for this product line, we tested three distinct strategies: email outreach, phone calls, and a combination of both methods. The goal of this analysis is to assess the performance of each strategy and offer insights to guide future sales efforts.
This report outlines the data validation and cleaning process, presents the results of the exploratory data analysis, defines a key metric for ongoing performance monitoring, and provides actionable recommendations. By leveraging data-driven insights, we aim to make informed decisions that enhance sales effectiveness and support the success of the new product line.
2 Data Importing and Validation
# import data
sales_data = pd.read_csv('product_sales.csv')
sales_data.head()
sales_data.shape
sales_data.info()sales_data.describe()# Check the value counts to ensure there are only 3 unique values
print(sales_data['sales_method'].value_counts())
# Define a mapping dictionary to correct the inconsistent values
sales_method_mapping = {
'Email': 'Email',
'Call': 'Call',
'Email + Call': 'Email + Call',
'em + call': 'Email + Call',
'email': 'Email'
}
# Apply the mapping to the 'sales_method' column
sales_data['sales_method'] = sales_data['sales_method'].map(sales_method_mapping)
# Check the value counts to ensure there are only 3 unique values
print(sales_data['sales_method'].value_counts())
# find mean revenue for each sales method
mean_revenue_by_sales_method = sales_data.groupby('sales_method')['revenue'].mean()
print(mean_revenue_by_sales_method)
def replace_null_revenue(row):
"""
Replaces null (NaN) values in the 'revenue' column of a pandas DataFrame with the mean (or median) revenue
for the corresponding 'sales_method' group.
Parameters:
-----------
row : pandas Series
A single row of a pandas DataFrame containing the 'revenue' and 'sales_method' columns.
Returns:
--------
float
The value of the 'revenue' column for the given row, either the original value if it is not null, or
the mean (or median) revenue for the corresponding 'sales_method' group if it is null.
"""
if pd.isnull(row['revenue']):
return mean_revenue_by_sales_method[row['sales_method']]
else:
return row['revenue']
# apply function to the revenue column
sales_data['revenue'] = sales_data.apply(replace_null_revenue, axis=1)
# check for any null values in the revenue column
print(sales_data['revenue'].isnull().sum())