Skip to content
Smart Segmenting: How can we better understand our customers?
Executive Summary
This report has been completed for a Switzerland based medical device manufacturer. The company sells directly to individual doctors who use these devices for rehabilitation and physical therapy patients. Historically, customers have been segmented by their respective regions, however, this has not been a reliable indicator of how many purchases a customer will make or what their support needs will be. This report will assist the marketing, customer service, and product teams by providing a data centric roadmap. More specifically, the report will focus on addressing the below questions:
- How many doctors are there in each region? What is the average number of purchases per region?
- Can you find a relationship between purchases and complaints?
- Define new doctor segments that help the company improve marketing efforts and customer service.
- Identify which features impact the new segmentation strategy the most.
- Describe which characteristics distinguish the newly defined segments.
💾 The data
The company stores the information you need in the following four tables. Some of the fields are anonymized to comply with privacy regulations.
Doctors contains information on doctors. Each row represents one doctor.
- "DoctorID" - is a unique identifier for each doctor.
- "Region" - the current geographical region of the doctor.
- "Category" - the type of doctor, either 'Specialist' or 'General Practitioner.'
- "Rank" - is an internal ranking system. It is an ordered variable: The highest level is Ambassadors, followed by Titanium Plus, Titanium, Platinum Plus, Platinum, Gold Plus, Gold, Silver Plus, and the lowest level is Silver.
- "Incidence rate" and "R rate" - relate to the amount of re-work each doctor generates.
- "Satisfaction" - measures doctors' satisfaction with the company.
- "Experience" - relates to the doctor's experience with the company.
- "Purchases" - purchases over the last year.
Orders contains details on orders. Each row represents one order; a doctor can place multiple orders.
- "DoctorID" - doctor id (matches the other tables).
- "OrderID" - order identifier.
- "OrderNum" - order number.
- "Conditions A through J" - map the different settings of the devices in each order. Each order goes to an individual patient.
Complaints collects information on doctor complaints.
- "DoctorID" - doctor id (matches the other tables).
- "Complaint Type" - the company's classification of the complaints.
- "Qty" - number of complaints per complaint type per doctor.
Instructions has information on whether the doctor includes special instructions on their orders.
- "DoctorID" - doctor id (matches the other tables).
- "Instructions" - 'Yes' when the doctor includes special instructions, 'No' when they do not.
Importing libraries and data
## Importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from statistics import mode
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import silhouette_score
from sklearn.cluster import KMeans
sns.set()
## Importing data
doctors = pd.read_csv('data/doctors.csv')
orders = pd.read_csv('data/orders.csv')
complaints = pd.read_csv('data/complaints.csv')
instructions = pd.read_csv('data/instructions.csv')
## Replace '--' value with NA placeholder
doctors['Satisfaction'] = doctors['Satisfaction'].replace(['--'],np.nan)
doctors.info()
doctors.isna().sum() / len(doctors)
orders.info()
orders.isna().sum() / len(orders)
complaints.info()
complaints.isna().sum() / len(complaints)
instructions.info()
instructions.isna().sum() / len(instructions)
Analysis
Number of Doctors in each Region
‌
‌
‌
‌
‌