Skip to content
Customer Segmentation: More Than Just A Phrase
  • AI Chat
  • Code
  • Report
  • 1. Introduction

    1.1. Outline

    Our company manufactures orthopedic devices and sells them worldwide. The company sells directly to individual doctors who use them on rehabilitation and physical therapy patients. Historically, the sales and customer support departments have grouped doctors by geography. However, the region is not a good predictor of the number of purchases a doctor will make or their support needs. We want to use a data-centric approach to segmenting doctors to improve marketing, customer service, and product planning. Our main goal throughout this analysis is composed of three objectives:

    1. The number of doctors and their average number of purchases per region
    2. The relationship between purchases and complaints
    3. New customer segmentation

    1.2. The data

    The company stores the information you need in the following four tables. Some of the fields are anonymized to comply with privacy regulations.

    Doctors contains information on doctors. Each row represents one doctor.
    • "DoctorID" - is a unique identifier for each doctor.
    • "Region" - the current geographical region of the doctor.
    • "Category" - the type of doctor, either 'Specialist' or 'General Practitioner.'
    • "Rank" - is an internal ranking system. It is an ordered variable: The highest level is Ambassadors, followed by Titanium Plus, Titanium, Platinum Plus, Platinum, Gold Plus, Gold, Silver Plus, and the lowest level is Silver.
    • "Incidence rate" and "R rate" - relate to the amount of re-work each doctor generates.
    • "Satisfaction" - measures doctors' satisfaction with the company.
    • "Experience" - relates to the doctor's experience with the company.
    • "Purchases" - purchases over the last year.
    Orders contains details on orders. Each row represents one order; a doctor can place multiple orders.
    • "DoctorID" - doctor id (matches the other tables).
    • "OrderID" - order identifier.
    • "OrderNum" - order number.
    • "Conditions A through J" - map the different settings of the devices in each order. Each order goes to an individual patient.
    Complaints collects information on doctor complaints.
    • "DoctorID" - doctor id (matches the other tables).
    • "Complaint Type" - the company's classification of the complaints.
    • "Qty" - number of complaints per complaint type per doctor.
    Instructions has information on whether the doctor includes special instructions on their orders.
    • "DoctorID" - doctor id (matches the other tables).
    • "Instructions" - 'Yes' when the doctor includes special instructions, 'No' when they do not.

    1.3. Methods

    1.3.1. At the first stage, a preliminary data transformation was conducted including:
    • Changing data types to more suitable formats
    • Regrouping sparse categorical variables
    • Filling missing values with the most appropriate option
    • Merging the datasets in one whole dataset for the next step
    1.3.2. For the second half of the report, a data pipeline was introduced to segment the customer for further insights. This pipeline includes:
    • Selecting the candidate feature for the next step
    • Diving the feature into numerical and categorical variables
    • Each numerical variable was transformed into logarithmic scale and was further transformed into normalized scale, the missing values were imputed using median
    • Each categorical variable was one hot encoded and the missing values were imputed using the most frequent category
    • All the transformed features were further transformed using Kernel Principal Components into 4 main components
    • Using the components from the previous stage, k-means clustering was conducted
    • Repeating the previous steps with different feature candidates and number of clusters
    • Selecting the final segments
    • Extracting the feature importance

    1.4. Executive summary:

    • The geographical regions with higher proportion of specialists had more average purchases, but the total number of doctors in these areas won't play a part in higher average demands.
    • There was no obvious pattern between doctors last year's purchases and their complaint records, however every customer who has placed an order this year, has made at least one complaint. That is a very concerning issue, and before our customer service gets overwhelmed, we need to address whether there's a problem with our current devices and try to tackle the issue as fast as we can.
    • A new customer segmentation was introduced with three different segments. Each customer group shows a different behavioral pattern and their own unique sets of characteristics from the other. The following list are each customer purchase and complaint style along with suggestions to better support them.
    CategoryWhy should we cherish them?In what areas do they need more support from us?How can we make them happier?
    Demanding Purchaser
    • They have made the highest average amount of purchases last year.
    • They have the highest average of orders now.
    • They have the highest level of satisfaction with our company among the other two groups.
    • They generally tend to complain more.
    • They produce the highest level of reworks.
    • Find their preferred communication style and set regular check ups.
    • Tag them as high priority in our support system.
    • Regularly ask for their feedback.
    Devoted Veterans
    • They have the most experience with our company.
    • They have low proportion of complaints.
    • They produce the least demand of reworks.
    • They have the highest proportion of general practitioners - 20%.
    • They're on average the least satisfied among the other two.
    • They had the lowest amount of purchase last year and orders this year.
    • Create a customer loyalty program and make sure to include them.
    • Offer them our loyalty discounts.
    • The first customers getting notified of our promotions.
    Conservative Freshmen
    • They have the lowest proportion of complaints.
    • They have the lowest average of rework in terms of R rate.
    • They have the lowest experience with our company so far.
    • On average, they have the low amount of purchases from the previous year and orders now.
    • Share our previous customers success story with them.
    • Ask them to join our social media.
    • Getting them voice their needs.

    2. Objectives

    2.1. The number of doctors and their average number of purchases per region

    We were particularly interested in the relationship between the doctors and their region and if there's high demand in any particular region. According to figure 1 it doesn't seem that the number of doctors in each region is related to the total number of purchases in that area. Two regions with as little as one doctor had the highest average number of purchases last year. Checking the correlation between size of a region and average number of purchases has proved this hypothesis to be true -0.13 of pearson correlation. It seems that regions' demands come from each doctor's special need rather than doctors in special regions with higher tendency to purchase.

    Since the region names were anonymized, we couldn't go further with the inference that geographical proximated regions show similar behavior of purchases.

    A feature that makes regions more interesting to look at is the proportion of specialists and general practitioners. In our dataset, this proportion is highly diverse. That could be one of reasons that geographical segmentation was selected in our previous marketing strategy, since specialists' needs could be quite different from general practitioners. Based on our historical data in the last year, specialists tend to have more purchases than the general practitioners and they have higher orders in our ranking record. There's an actual difference between Specialist's average number of purchases (11.6) and those of General practitioners (6.4). - significant at 0.05 level with two independent samples t-test of unequal variances.

    We infer that the regions with higher proportion of specialists had more average purchases, but the total number of doctors in regions won't play a part in higher average of demands.

    N.B. In order to get a more consistent plot, without being affected by the noise, the data from region 1 19 20 was omitted. This region has one sole specialist who had purchased 120 items last year. This showed a different pattern from the rest of the data.

    2.2. The relationship between purchases and complaints

    A company who listens to its customers' unmet needs, can easily stand out from the others. Getting to know more about the customers' complaints frequency and their type can give us a huge advantage for our more personalized customer services.

    2.2.1. A preview of our complaints statistics

    • Each complaint record for a doctor is varied from 0 to 19 total number of complaints.
    • 16% of all doctors have placed a complaint.
    • 56% of all doctors who had placed a complaint, had more than 1 total number of records.
    • 55% of all recorded complaints were correct, 26% were falsely declared and 18% of all complaints status were not reported.
    • 90% of all doctors who have made a complaint are Specialists. 97% of all the complaints were made of Specialists.
    • Some customers have a total number of complaints more than the items they have ordered currently, this could probably be the case of multiple complaints on one item or complaints for the previously purchased items.

    The barplots of our complaints types and their frequency are provided in figures 2 and 3.

    N.B. The complaints table contained some doctors which were not recorded in our doctors metadata table. Since their information was not provided, these were removed in the the analysis.

    2.2.2. Complaints and purchases relationship

    There was no obvious pattern between doctors last year's purchases and their complaint records, however there was a very interesting relationship:

    Every customer who has placed an order this year, has made at least one complaint. Each current customer has an average of 4 (3.58) complaints. These contains 74 customers of our doctors list.

    That is a very concerning issue, and before our customer service gets overwhelmed, we need to address whether there's a problem with our current devices and try to tackle the issue as fast as we can.

    2.3. New customer segmentation

    Historically, the sales and customer support departments have grouped doctors by geography. However, this proved not to be a good predictor of the number of purchases a doctor will make or their support needs. Therefore we need to use a data-centric approach to segmenting doctors to improve marketing, customer service, and product planning.

    This part is classified into three main sections:

    2.3.1. Choosing the candidate features, transforming and scaling them

    We want the final segments to be as conclusive as possible, however there's a trade-off between the number of features and interpretability of the final clusters. On the other hand, we have features that are highly correlated - number of Total Complaints and Current Purchases - or have a great proportion of missing values - Satisfaction and Instructions - and we didn't want them to negatively impact our final result. Total number of segments was another challenge as well. The ultimate goal of final segments was interpertibilty so instead of focusing on the segments which had the best values in terms of inertias or silhouette scores, I looked for the final segments which could distinguish the final groups the best. Both inertia and silhouette scores pointed for 4 clusters, but four final segments couldn't generate interpertible results. The whole data pipeline process is provided in the second part of the appendix.

    2.3.2. Defining a procedure to find the best possible customer groups

    At this stage various feature combinations and number of clusters were tested and the candidate result was examined. Each selected feature was further transformed, scaled and transformed again with the help of dimensional reduction technique, in order to give the most distinguishable characteristic as possible. The features R rate and Incidence rate which were introduced as reworks that each doctor generates, were further transformed into a new variable named Rework, which is the linear combination of these scaled features. The final result has Rework, Experience, Previous Purchases, and Current Purchases as their final features and the dataset was divided into three segments. The three final segments were named: Demanding Purchasers, Devoted Veterans and Conservative Freshmen. They are examined thoroughly in the next stage.

    2.3.3. Distinguishing each segment characteristics

    The snake plot of these group categories is provided in figure 4. From this figure we can imply the general characteristics of each group:

    What's their general characteristic?

    • Demanding Purchasers:
      • They have the highest number of purchases in the last year and the highest number of orders currently. They generate the highest rework for our company.
    • Devoted Veterans:
      • They are long time customers. They know our company pretty well but they tend to purchase less than the other two groups.
    • Conservative Freshmen: T
      • They have the least amount of experience with us, however they show a similar pattern as of Demanding Purchases Perhaps if we can build trust in them, they might purchase more in the future.

    How can we distinguish them from the other customer groups?

    • Demanding Purchasers
      • They generally tend to complain more.
      • They produce the highest level of reworks.
    • Devoted Veterans:
      • They're on average the least satisfied among the other two.
      • They had the lowest amount of purchase last year and orders this year.
    • Conservative Freshmen:
      • They have the lowest experience with our company so far.
      • On average, they have the low amount of purchases from the previous year and orders now.

    Figure 5 shows a heatmap of various features in our newly defined customer segmentation. The most defining characteristics of these segments are Incidence rate, Current purchases, Previous purchases and Experience. Total Complaints is highly correlated with Current purchases and hence is considered important.

    Although each customer group shows a different behavioral pattern from the other, each has their own unique sets of characteristics and by carefully examining them we can have better strategic plans for our marketing and customer service team. The following list are the suggestions to better support each customer group, based on their prefered style.

    • Demanding Purchasers:
      • Find their preferred communication style and set regular check ups.
      • Tag them as high priority in our support system.
      • Regularly ask for their feedback.
    • Devoted Veterans:
      • Create a customer loyalty program and make sure to include them.
      • Offer them our loyalty discounts.
      • The first customers getting notified of our promotions.
    • Conservative Freshmen:
      • Share our previous customers success story with them.
      • Ask them to join our social media.
      • Getting them voice their needs.

    The next table captures the essence of each group, comparing their attributes to our total population.

    3. Conclusion

    The key to creating and maintaining successful relationships with business’s different customer groups is understanding their preferred styles of communication and their unique needs. This helps us effectively and efficiently meet (and, hopefully, exceed) their expectations. Previously the company was able to segment its customers just based on their geographical locations. With this analysis we investigated further on more characteristics and we were able to segment these customers more accurately just based on their behavioral patterns. We have identified three groups of customers:

    • Conservative Freshmen - They are customers with lowest level of experience and purchases.
    • Demanding Purchasers - They tend to have a high amount of purchases but they can complain much more. They generate highest level of rework for the company.
    • Devoted Veterans - They are loyal customers with lots of experience with our company, but the lowest level of purchases last year purchases and current orders.

    Furthermore, the customers who have a record of complaint are the ones who have currently ordered an item. This needs a serious investigation of our stock devices.

    4. Appendix

    4.1. Data munging and data transformation