Skip to content
CUSTOMER SEGMENTATION REPORT

INTRODUCTION

It is important to divide customers into groups or segments as this will enable the identification of high yield segments, i.e. (segments that are likely to be the most profitable) so that these segments can be selected for special attention, making them our target markets. It is not just designed to determine profitable segments, but also develop profiles of key segments to better understand their needs and purchase motivations. Customer segmentation enables us to develop a relationship with our targeted audience who are most likely to buy and interact with our products. There are several approaches or types of customer segmentation which includes: demographic segmentation, geographic segmentation, behavioral segmentation, psychographic segmentation etc.

ANALYSIS

Historically, our customers were segmented based on their region(geographically), subsequently the results were not efficient or reliable because this method is useful only for small enterprises and to target local audience. Instead, our customers were segmented using the demographic approach because it enables the discovery of new and unknown segments in large amount of data like our dataset. This segmentation approach will enable us to discover which group of doctors purchases our equipment the most. Using R software to carry out our analysis, the total number of doctors and average purchase made for each region was calculated to determine which region purchased more equipment.

Fom the analysis which can be seen in the table, it can be observed that regions with highly populated doctors do not have high number of purchases. Therefore, the geographic segmentation is not a valid approach to creating customer segments. However, some complaints were made by the customers concerning the products purchased which led to the question " Do these complaints have any effect on the number of purchases made by our customers?". Hence, a scatterplot was adopted to visualize the relationship between purchases and complaints. As seen in the image below, there is no form of relationship between purchases and complaints. Furthermore, a correlation test was carried out, using the pearsons product method, which gave a value of 0.013, indicating no form of relationship between the two variables. Hence, the complaints made by our customers have no effect on the number of purchases they make but this does not rule out the impact these complaints place on our enterprise as negligence of these complaints can result in the withdrawal of our customers. Therefore, to create new customer segments, a cluster analysis was adopted to determine the set of objects with similar characteristics so they can be grouped into clusters. The "kmeans clustering method" was used because our data does not have defined categories or groups and this analysis will aid in finding groups in the data. The optimal number of clusters was selected using the Elbow method which takes the sum of squared distance between each point and the centroid in a cluster, where the number of clusters were divided into "5" as seen in the image.

A cluster plot was also used to visualize the observations or points in each cluster. It takes kmeans results and the original data as arguments. In the plot below, the cluster groups are colored as well as the observations belonging to each cluster.

The following features impacted the new segmentation strategy. • Category – which shows the type of doctor, either Specialist or General Practitioner. • Rank - this is an internal ranking system where the highest level is Ambassadors, followed by Titanium Plus, Titanium, Platinum Plus, Platinum, Gold Plus, Gold, Silver Plus, and the lowest level is Silver. • Incidence rate and R rate – this relates to the amount of re-work each doctor generates. The features that impacted the most in this new segment were the rank and category of doctors. These features were selected as part of the demographic segmentation approach. The demographic segmentation contains features such as age, marital status, income, gender, and professional occupation. Regarding this segmentation approach, the professional occupation which is labelled as the rank and category in our dataset, impacted the new segmentation strategy. The characteristics which distinguish the newly defined segments can be interpreted as follows;

  1. Specialist doctors with the highest rank(ambassador)
  2. General practitioners with the silver plus rank
  3. General practitioners with the gold rank
  4. Specialist doctors with the platinum rank
  5. Specialist doctors with titanium rank

CONCLUSION AND RECOMMENDATION

The objective of a customer segmentation analysis is to define our target audience or market and monitor the progress of our product in its lifecycle. From our analysis conducted, it was observed that the highest-ranking specialist doctors purchase more of our medical equipment, followed by the general practitioners with the silver plus rank and the gold rank. Therefore, our targeted audience have been successfully defined. We recommend that we focus more on the specialist doctors with the highest rank and the general practitioners with silver and gold rank, hence, ensuring they have a smooth purchase experience, without having any complaints concerning our customer service. Interviews should be carried out on these targeted audience, to find out what could be added to improve and add to our customer support system.

💾 The data

The company stores the information you need in the following four tables. Some of the fields are anonymized to comply with privacy regulations.

Doctors contains information on doctors. Each row represents one doctor.
  • "DoctorID" - is a unique identifier for each doctor.
  • "Region" - the current geographical region of the doctor.
  • "Category" - the type of doctor, either 'Specialist' or 'General Practitioner.'
  • "Rank" - is an internal ranking system. It is an ordered variable: The highest level is Ambassadors, followed by Titanium Plus, Titanium, Platinum Plus, Platinum, Gold Plus, Gold, Silver Plus, and the lowest level is Silver.
  • "Incidence rate" and "R rate" - relate to the amount of re-work each doctor generates.
  • "Satisfaction" - measures doctors' satisfaction with the company.
  • "Experience" - relates to the doctor's experience with the company.
  • "Purchases" - purchases over the last year.
Orders contains details on orders. Each row represents one order; a doctor can place multiple orders.
  • "DoctorID" - doctor id (matches the other tables).
  • "OrderID" - order identifier.
  • "OrderNum" - order number.
  • "Conditions A through J" - map the different settings of the devices in each order. Each order goes to an individual patient.
Complaints collects information on doctor complaints.
  • "DoctorID" - doctor id (matches the other tables).
  • "Complaint Type" - the company's classification of the complaints.
  • "Qty" - number of complaints per complaint type per doctor.
Instructions has information on whether the doctor includes special instructions on their orders.
  • "DoctorID" - doctor id (matches the other tables).
  • "Instructions" - 'Yes' when the doctor includes special instructions, 'No' when they do not.
doctors <- readr::read_csv('data/doctors.csv', show_col_types = FALSE)

doctors
 
Hidden output
orders <- readr::read_csv('data/orders.csv', show_col_types = FALSE)

orders
Hidden output
complaints <- readr::read_csv('data/complaints.csv', show_col_types = FALSE)

complaints
Hidden output
doctors <- readr::read_csv('data/doctors.csv', show_col_types = FALSE)

doctors
 
Hidden output
#Number of doctors in each region
library(dplyr)
doctors %>% 
 group_by(Region) %>% 
summarize(n= n())
Hidden output
# Average number of purchases per region
library(dplyr)
doctors %>% 
 group_by(Region) %>% 
 summarize(mean(Purchases))
Hidden output
library(dplyr)
doctors_merged<- doctors %>% 
inner_join(complaints, by= "DoctorID")

doctors_merged
Hidden output
#relationship between purchases and quantity using scatter plot
library(ggplot2)
ggplot(doctors_merged, aes(x=Purchases, y=Qty)) +geom_point()+ geom_smooth(method= "lm", se= FALSE)
# correlation between the two variables
library(dplyr)
  cor(doctors_merged$Purchases, doctors_merged$Qty,use= "complete.obs")
#simplify dataset
library(dplyr)
new_doctors<- doctors[, 3:9]  %>%  
select(-"Satisfaction",-"Purchases",-"Experience")
new_doctors
Hidden output
unique(new_doctors$Category)
unique(new_doctors$Rank)
Hidden output
#building dataset
#isolate numeric
library(dplyr)
numeric<- new_doctors %>%
 select_if(is.numeric)

numeric

Hidden output