Skip to content

Data Analyst Professional Practical Exam Submission

I) Project Context

About Pens and Printers

Pens and Printers was founded in 1984 and provides high quality office products to large organizations. They are a trusted provider of everything from pens and notebooks to desk chairs and monitors. They don’t produce their own products but sell those made by other companies.

They have built long lasting relationships with their customers and they trust them to provide the best products. As the way in which consumers buy products is changing, their sales tactics have to change too. Launching a new product line is expensive and they need to make sure theyr are using the best techniques to sell the new product effectively. The best approach may vary for each new product so they need to learn quickly what works and what doesn’t.

New Product Sales Methods

Six weeks ago they launched a new line of office stationery. Despite the world becoming increasingly digital, there is still demand for notebooks, pens and sticky notes.

Their focus has been on selling products to enable our customers to be more creative, focused on tools for brainstorming. They have tested three different sales strategies for this, targeted email and phone calls, as well as combining the two.

Email: Customers in this group received an email when the product line was launched, and a further email three weeks later. This required very little work for the team.

Call: Customers in this group were called by a member of the sales team. On average members of the team were on the phone for around thirty minutes per customer.

Email and call: Customers in this group were first sent the product information email, then called a week later by the sales team to talk about their needs and how this new product may support their work. The email required little work from the team, the call was around ten minutes per customer.

📝 Objectives:

The written report we've been asked to deliver had to include written text summaries and graphics of the following:

  • Data validation:
    • Describe validation and cleaning steps for every column in the data
  • Exploratory Analysis:
    • Include two different graphics showing single variables only to demonstrate the characteristics of data
    • Include at least one graphic showing two or more variables to represent the relationship between features
    • Describe your findings
  • Definition of a metric for the business to monitor
    • How should the business use the metric to monitor the business problem
    • Can you estimate initial value(s) for the metric based on the current data
  • Final summary including recommendations that the business should undertake

II) Project Report

Data Validation: Step by Step

  1. Loaded the dataset and identified missing values in the 'revenue' column. Loaded relevant Python libraries
Spinner
DataFrameas
df
variable
-- Load data from google sheets file using SQL
SELECT *
FROM 'product_sales (3)';
# Load all relevant python libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
  1. Confirmed the data fields and its types fit the project description of the database
# Change name of the dataset to data. Replaced the "NA" values to null values and used the info function to detect which columns had missing values and the datatypes of each
data = pd.DataFrame(df).replace("NA", np.nan)
df.info()
  1. Confirmed that all values in the 'customer_id' column are unique
# Confirm unique values for customer_id
check_duplicates = data.duplicated("customer_id").value_counts()

print(check_duplicates)
  1. Removed rows with records of 'years_as_customer' equal or higher than 40 (the company was founded 40 years ago)
# Used the describe function to detect abnormal values
data.describe()
# counted number of entries with irregular years as customer
over_40 = data[data['years_as_customer']>40].index
print(over_40)