Skip to content
0

Can you find a better way to segment your customers?

๐Ÿ“– Background

You work for a medical device manufacturer in Switzerland. Your company manufactures orthopedic devices and sells them worldwide. The company sells directly to individual doctors who use them on rehabilitation and physical therapy patients.

Historically, the sales and customer support departments have grouped doctors by geography. However, the region is not a good predictor of the number of purchases a doctor will make or their support needs.

Your team wants to use a data-centric approach to segmenting doctors to improve marketing, customer service, and product planning.

๐Ÿ’พ The data

The company stores the information you need in the following four tables. Some of the fields are anonymized to comply with privacy regulations.

Doctors contains information on doctors. Each row represents one doctor.
  • "DoctorID" - is a unique identifier for each doctor.
  • "Region" - the current geographical region of the doctor.
  • "Category" - the type of doctor, either 'Specialist' or 'General Practitioner.'
  • "Rank" - is an internal ranking system. It is an ordered variable: The highest level is Ambassadors, followed by Titanium Plus, Titanium, Platinum Plus, Platinum, Gold Plus, Gold, Silver Plus, and the lowest level is Silver.
  • "Incidence rate" and "R rate" - relate to the amount of re-work each doctor generates.
  • "Satisfaction" - measures doctors' satisfaction with the company.
  • "Experience" - relates to the doctor's experience with the company.
  • "Purchases" - purchases over the last year.
Orders contains details on orders. Each row represents one order; a doctor can place multiple orders.
  • "DoctorID" - doctor id (matches the other tables).
  • "OrderID" - order identifier.
  • "OrderNum" - order number.
  • "Conditions A through J" - map the different settings of the devices in each order. Each order goes to an individual patient.
Complaints collects information on doctor complaints.
  • "DoctorID" - doctor id (matches the other tables).
  • "Complaint Type" - the company's classification of the complaints.
  • "Qty" - number of complaints per complaint type per doctor.
Instructions has information on whether the doctor includes special instructions on their orders.
  • "DoctorID" - doctor id (matches the other tables).
  • "Instructions" - 'Yes' when the doctor includes special instructions, 'No' when they do not.
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(plotly))
suppressPackageStartupMessages(library(gridExtra))
doctors <- readr::read_csv('data/doctors.csv', show_col_types = FALSE)
doctors
orders <- readr::read_csv('data/orders.csv', show_col_types = FALSE)
orders
complaints <- readr::read_csv('data/complaints.csv', show_col_types = FALSE)
complaints
instructions <- readr::read_csv('data/instructions.csv', show_col_types = FALSE)
instructions

How many Doctors are in each Region?

# Defining Doctors Data Summarized by Region
docReg <- doctors %>%
	select(DoctorID, Region) %>%
	group_by(Region) %>% 
	summarize(total_doctors = n()) %>% 
	arrange(desc(total_doctors))

# Plotting Num Doctors per Region
docRegPlot <- ggplot(docReg, aes(x = reorder(Region, desc(total_doctors)), y = total_doctors)) +
	geom_col(aes(fill = total_doctors)) +
	theme_classic() +
	labs(title = 'Total Doctors by Region',
		 x = 'Region ID',
		 y = 'Number of Doctors') +
	guides(fill = 'none') +
	scale_y_continuous(breaks = seq(0, 35, by = 5)) +
	scale_fill_viridis_c(option = 'D') +
	coord_flip()

# Plotting top 10 Regions by Num of Doctors
docReg10Plot <- docReg[1:10,] %>% ggplot(., aes(x = reorder(Region, desc(total_doctors)), y = total_doctors)) +
	geom_col(aes(fill = total_doctors)) +
	theme_classic() +
	labs(title = 'Total Doctors by Region - Top 10',
		 x = 'Region ID',
		 y = 'Number of Doctors') +
	guides(fill = 'none') +
	scale_y_continuous(breaks = seq(0, 35, by = 5)) +
	scale_fill_viridis_c(option = 'D') +
	coord_flip()

docRegPlot; docReg10Plot

What is the Average Number of Purchases per Region?

# Summarizing doctors data by Purchases per region
purchReg <- doctors %>% 
	select(Region, Purchases) %>%
	group_by(Region) %>% 
	summarize(avg_purchases = sum(Purchases)/n()) %>%
	arrange(desc(avg_purchases))
purchReg

# Defining Plot of Average Num Purchases per REgion
avgPurchPlot <- ggplot(purchReg, aes(x = reorder(Region, avg_purchases), y = avg_purchases)) +
	geom_col(aes(fill = avg_purchases)) +
	theme_classic() +
	guides(fill = 'none') +
	labs(title = 'Average Purchases by Region',
		 x = 'Region ID',
		 y = 'Average Purchases') +
	scale_fill_viridis_c(option = 'D') +
	scale_y_continuous(breaks = seq(0, 129, by = 10)) +
	coord_flip()

# Defining Plot of 10 Regions by Avg num purchases
avgPurch10Plot <- ggplot(purchReg[1:10,], aes(x = reorder(Region, avg_purchases), y = avg_purchases)) +
	geom_col(aes(fill = avg_purchases)) +
	theme_classic() +
	guides(fill = 'none') +
	labs(title = 'Average Purchases by Region - Top 10',
		 x = 'Region ID',
		 y = 'Average Purchases') +
	scale_fill_viridis_c(option = 'D') +
	scale_y_continuous(breaks = seq(0, 129, by = 10)) +
	coord_flip()

avgPurchPlot; avgPurch10Plot

Is there a Relationship between Purchases and Complaints?

# Combining doctors and complaints datasets
purComp <- doctors %>% 
	select(DoctorID, Purchases) %>%
	right_join(complaints, by = 'DoctorID') %>%
	group_by(DoctorID) %>%
	summarize(Purchases = sum(Purchases)/n(),
			  tot_comp = sum(Qty))

# Defining NAs in purchase column to mean 0 purchases
purComp$Purchases[is.na(purComp$Purchases)] <- 0

# Calculating correlation between total complaints per doctor and total purchases by doctor
purComp %>%
	select(Purchases, tot_comp) %>% 
	cor()

# Plotting purchases by complaints
ggplot(purComp, aes(x = Purchases, y = tot_comp)) +
	geom_point() +
	theme_classic() +
	labs(title = 'Weak Correlation between Purchases and Total Complaints',
		 x = 'Number of Purchases',
		 y = 'Total Complaints')

Market Segmentation

Segmentation by Region Characteristics

The first segmentation is defined by the total doctors in each region. Regions were segmented into 5 categories. The second segementation further divided the region levels by identifying those which had above average purchases within their region group.

# Combining two regional datasets to compare doctors and purchase level
docReg2 <- docReg %>% 
	mutate(Region_Level = cut(total_doctors, breaks = seq(0, 35, by = 5))) %>%
	left_join(purchReg, by = 'Region') %>%
	group_by(Region_Level) %>%
	mutate(High_Purchase = avg_purchases > median(avg_purchases))

# Plotting Regions by total doctors, attention to region and purchase levels
ggplot(docReg2, aes(y = total_doctors, 
					x = reorder(Region, total_doctors), 
					alpha = factor(High_Purchase), 
					fill = factor(Region_Level))) +
	geom_col() +
	#geom_point(aes(y = avg_purchases, color = factor(High_Purchase))) + 
	theme_classic() +
	labs(fill = 'Region Level',
		 x = 'Region ID',
		 y = 'Total Doctors',
		 alpha = 'High Purchases',
		 title = 'Region Levels by Total Doctors, Highlighting above Avg. Purchases') +
	coord_flip() +
	scale_alpha_manual(values = c(.3,1)) +
	facet_grid(~ High_Purchase, scales = 'free_x')	

# Boiling down plot to contain only High Purchasers
docReg2 %>% filter(High_Purchase == TRUE) %>%
	ggplot(., aes(x = reorder(Region, total_doctors), y = total_doctors, fill = factor(Region_Level))) +
	geom_col() +
	theme_classic() +
	coord_flip() +
	labs(title = 'Regions by Total Doctors and Above Average Purchase',
		 x = 'Region ID',
		 y = 'Total Doctors',
		 fill = 'Region Bucket')

Segmentation by Doctor Characteristics

Doctors were initialize categorized by total purchases. The next segmentation identified that specialists contribute a significant more to purchases relative to general practitioners. Specialist doctors can be further segmented by whether they have an above average Rework level. After identifying specialists with high rework, doctors were further segmented by whether they had above average experience. The most and least experience doctors contribute about the same to total purchases across all purchase levels. Looking deeper into experienced specialists with high rework shows that those with the most purchases are some of most satisfied customers.

โ€Œ
โ€Œ
โ€Œ