Superior customer segmentation to the status quo

Can you find a better way to segment your customers?

📖 Background

You work for a medical device manufacturer in Switzerland. Your company manufactures orthopedic devices and sells them worldwide. The company sells directly to individual doctors who use them on rehabilitation and physical therapy patients.

Historically, the sales and customer support departments have grouped doctors by geography. However, the region is not a good predictor of the number of purchases a doctor will make or their support needs.

Your team wants to use a data-centric approach to segmenting doctors to improve marketing, customer service, and product planning.

2 hidden cells

Hidden code

Q1.1 How many doctors are there in each region?

df = doctors.groupby('Region')['DoctorID'].count().reset_index().sort_values(by="DoctorID", ascending = False)
df.columns = ['Region', 'Count']
df.head(10)

plt.figure(figsize = (16, 6))
sns.barplot(data = df, x = "Region", y = "Count")
plt.xticks(rotation = 90)
plt.tight_layout()

df = doctors.groupby(['Region','Category'])['DoctorID'].count().reset_index()
df.columns = ['Region', 'Category', 'Count']
df = pd.pivot(data = df, index='Region', columns='Category', values='Count').sort_values(by = "Specialist", ascending = False).fillna(0)
df = df.reset_index()
df = df.set_index("Region")
df['Total'] = df['General Practitioner'] + df['Specialist']
df = df.sort_values(by = "Total", ascending = False)
df.head()

df = doctors.groupby(['Region','Category'])['DoctorID'].count().reset_index()
df.columns = ['Region', 'Category', 'Count']
df = df.sort_values(by = "Count", ascending = False)
plt.figure(figsize = (16, 6))
sns.barplot(data = df, x = "Region", y = "Count", hue = "Category")
plt.xticks(rotation = 90)
plt.tight_layout()

Q1.2 What is the average number of purchases per region?

keep_cols = ['Purchases','Region']
df = doctors[keep_cols]
df = df.groupby('Region').agg({'Purchases':['mean','count','sum']}).reset_index()
df.columns = list(map("_".join, df.columns))
df.columns = ['Region', 'Average', 'Count', 'Sum']
df.sort_values(by = "Average", ascending = False).head(10)

plt.figure(figsize = (16, 6))
df = df.sort_values(by = "Average", ascending = False)
sns.barplot(data = df, x = 'Region', y = 'Average')
plt.xticks(rotation = 90)
plt.tight_layout()

plt.figure(figsize = (16, 6))
sns.scatterplot(data = df, x = 'Count', y = 'Average')
plt.tight_layout()

threshold = 5
df2 = df[df['Count'] > threshold].sort_values(by = "Average", ascending = False)
df2.head(10)

plt.figure(figsize = (16, 6))
sns.scatterplot(data = df2, x = 'Count', y = 'Average')
plt.tight_layout()

Q2 Can you find a relationship between Purchases and Complaints?