Analyzing customer churn in Excel
📖 Background
Are you ready to apply your Excel skills? For subscription-based businesses, reducing customer churn is a top priority. In this Excel competition, you'll investigate a dataset from telecom company Databel and analyze their churn rates.
Analyzing churn doesn’t just mean knowing the churn rate: it’s also about figuring out why customers are churning at the rate they are and how to reduce churn. Answer these questions by creating calculated columns, building PivotTables, and creating an eye-catching dashboard.
Use this Datalab workbook to download the data and import it in Excel on your machine. When you're finished, share your work by attaching your .xlsx file to this workbook. To do so, click File > Show workbook files, and then upload the file, or by dragging and dropping the file on the editor.
💾 The data
The data consists of four parts:
- Customer status: the status and reason why customers churned
- Demographics: demographic data on each customer
- Contract information: information on the type of contract
- Subscription types & charges: numerical data on how the customer uses his subscription
More information about the data can be found here.
➡️ The data can be downloaded from the Files section: click File > Show workbook files.
💪 Challenge
- Using Excel, analyze the data and create visuals to answer the following questions:
- What is the average churn rate?
- What are the main reasons customers churn?
- Are there differences in churn rate between customers that consume less than 3 GB of data and more than that? Do you see differences between customers with an unlimited subscription?
- Don't feel limited by the questions above, you're encouraged to use your skills and creativity to make your worksheet your own.
- Create a screenshot of your (main) Excel worksheet, and paste it into the designated field hereunder.
- Summarize your findings in an executive summary.
✅ Checklist before publishing
- Make sure to upload your Excel
.xlsxfile to this Datalab workbook, by clicking File > Show workbook files, and then upload the file, or by dragging and dropping the file on the editor. - Remove redundant text cells like data background, challenge, and this checklist. You can add cells if necessary. Make your publication stand out!
📷 Dashboard screenshot
<Paste one screenshot of your (main) Excel worksheet here.>
🧾 Executive summary
<In a couple of lines, write your main findings here.>
🌐 Upload your Excel file
Upload your .xlsx file by clicking File > Show workbook files, and then upload the file.
✍️ Judging criteria
This competition will not be judged. Use this opportunity to apply your Excel skills, and compare your work with your peers after the competition ends!
⌛️ Time is ticking. Good luck!
Hi, my name is Brian, and I'm attempting this as a challenge to understand Excel methodology and work on my fundementals to analyse data as a whole.
While the other projects were aimed to be a subject I was proficient in, I am not familiar with churn rates and business terminology as a whole. Hence, this also highlights the importance of having good context research for projects.
With that, this project is segmented to answer these questions:
-
What is the average churn rate? This will mainly focus on Churn Label, as churn rate is depiected as the number of loss customers / total customers as a percentage.
-
What are the main reasons customers churn? This part will focus on the Churn Category, as we look at churn purposes that were indicate to give a clue on where to look next. While it is a data on it's own, it can lead to interesting insights that uses other columns that we can compare for both churned customers and non-churned customers to see the exact reason. In this component, there can always be a underlying reason why the reason for churn is given, and it is our job as data analysis to constantly ask why to find the root problem. While every single column will be analysed, the initial stage will give some hindsight on which to focus on, and highlight on whether they are an actual problem.
-
Are there differences in churn rate between customers that consume less than 3GB of data and more than that? Do you see differences between customers with an umlimited subscription? This question is very specific, and hence will look at the Plan column, AVG monthly GB download and Unlimited Data Plan. We will also look at Extra Data Charges for any interesting information.
Executive Summary
This executive summary will be in a form of a dashboard. While this is the summarised version, if you are interested in the process of the project, feel free to scroll more to see the thought processes of this project.
Exploratory Data Analysis
Before we do any data analysis of the data, we will need to do exploratory data analysis to check the data for any identifiables errors that might impact our result.
Exploratory Data Analysis and Cleaning of data in Excel is pretty simple. Since there are lesser data to work with with the nature of Excel, and how Excel is structured to allow live editing of the data, cleaning of the data can be done simply by using the filter option.
For example, by applying the filter option and checking on the individual columns, you can see various different unique data that a column contains, and that can tell you various errors an exploratory data analysis framework can spot, including wrong data, data that are missing etc.
For the Customer Service Calls columns, when you apply filter, this appears
In this column, there is nothing jaring that might needs to be pointed out. Most of the values are does not fall out of place, and there are no data that might be slightly deviated that can be edited.
One practice that however cannot be applied is the method of spotting outliers. Outliers can definitely skew the analysis, and it is necessary to take apprioriate action that is recommended based on the context itself. There are also columns that should not be touched because of how unique they are, for example, Customer ID, where each individual value is meant to be unique and should not be altered.
In this portion, we will be highlighting columns that are of interest to be edited slightly. We will be doing it in a duplicated tab, so the original datasheet is not lost due to cleaning.
Intl Calls, Intl Mins
These columns have data that have decimal points that can reach up to 9, and to better group them into their respective groups for better visualisation, we will use round them up to 2 decimal points. This added a additional column to include the 2 decimal point column and the original column side by side.
Intl Plans
This column had text that are written as "yes" and "no" instead of "Yes" and "No" like the rest of the columns. To keep them consistent, we applied this minor edit to another column with proper capitalisation.
What is the average Churn Rate?
With this, we can do it in two ways.
- Since the column is either yes or no, we can create an additional column to indicate 1 on "yes", and use that sum() to divide with count() to find the percentage
- We can create a pivot table to get the rough numbers to do the calculation
In this case, we use the Customer ID and Churn Label to create a pivot table. Add Churn Label to the Column section, and Customer ID to the Count of Values,
We get
It is interesting to also look at outliers in this portion, as there aren't any blank portions.
Hence, we can conclude that the churn rate is 1796/6687 * 100 = 26.86%
What are the main reasons people churn?
To start this off, we look at the Churn Category Column, specifically focusing on the Count of the Churn Category. Like before, we will use a Pivot table to do so. First of all, we will need to exclude that customers who have not churned. Afterwards, we create a pivot table solely for those who have churned, as well as creating a barchart to analyse which reasons are the most apparant as a visulisation to aid readability.
This is the pivot table without excluding those who are not churned
And this is the pivot table excluding those who are not churned
From the 2nd table, we can create a barplot to better visualise the count of each category
In this image itself, it is obvious that we can write off competitor as a major contribution to the reason why people churn. However, this does not reveal much about what makes the compeitor better than Databel. For that, since we do not have access to competitor's data, we must look at current data statistics to look at major differences between churned and non-churned columns, and how they differ to make a conclusion as to why people stick to Databel compared to why people do not stick to Databel. The other two major reasons, Attitude and Dissatisfaction, does reveal reasons why, but does not reveal the root cause.
We also look at the column Churn Reason to reveal the next step as to why people churned.
In this visualisation itself or even the pivot table itself, we can see a few reasons that have very high votes. These are mainly, "Competition has better offer", "Competitor has better devices" and "Attitude of support person". While we do not understand what under competition most of these reasons attribute to, we do understand that one portion of improvement that Databel has to consider is the behaviour of support calls and replies that are offered. However, this can still be attributed to many different other underlying root cause, hence we analyse the other columns to understand whats different between the two variants of customers when it comes to churning.
To better visualise when both Churn Category and Churn Reason are mixed,
The reasons for churning mainly stays the same as the above, as Churn Category is mainly to categorise these reasons into sub groups for better clarification.
We include back datas for not churned to compare them with the churned customers to see anymore underlying reasons as to why people churn. In this portion, we will only include columns that are of interest.
Account Length
When plotting account there with an area graph, we find some interesting retention in both variants of customers
To give abit of context, Account length is defined by the number of months a user has used Databel as their provider. While most customers would generally give it a go to see the full scale of it's properties, many has churned even in the first month. This could lead to some interesting insights such as
- Is Databel not user-friendly?
- Is Databel not easy to use?
- Is Databel not enticing for new users?
There could be many reasons why this could be the case, but definitely, one thing to reduce churn rates is to understand the new users and their challenges with Databel. Furthermore, as pointed by the not churned customers, there seems to be a good influx of loyal customers that has stayed for almost more than 70 months, which means that while Databel might not be the most interesting to new users, Databel does definitely retain their loyal customers base fairly well, with even an slight increase in not churned customers compared to thoes who have churned later on.
In this, we narrow down the newly churned customers and associate them to their reasoning as to churning.
Focusing on the blue side of the bar chart, we can see that reasons remain mainly the same, with mostly opting that the competitor is better than Databel. The chart mainly remains the same as the previous overall analysis, and hence v less conclusions can be make besides knowing that most churned customers are newcomers.
Calls
We will be collating both local calls and international calls together. It has an interesting insight. First of all, we plot the column to the Churn Labels, and the rows to Account Length. Afterwards, the values will be attributed to the average of local and internaional calls, mainly because it is generally unfair to compare the sum of the values, as the longer someone used Databel, naturally the more time they have. The average will allow us to see them in a more fair light on each account length.
In the following diagram, notice that those who hasn't churned from Databel has a substaintial equal average of international and local calls, but for those who churned, there is a spike in international calls instead of local calls. This could shed some interesting insights, as to maybe people were coming to Databel expecting a good interntional call technology, only to be disappointed and prefer the competitor instead. With the previous competition argument that was detected, this could be one of the linking factors as to why people were churning from Databel.
This trend is also detected similarly when we compare the duration of call of local vs international.
Plan vs Active
One jaring observation was the differences in international vs local in previous statement was the arise in international calls vs local calls in those who have churned and those who have not. One observation we can go into is whether there is a trend that customers have used the international calls with the international plan, or they used the interntional calls without a plan, that might incur additional charges. For that, we look into the graph below
To read the following graph, the first 4 bars on the left represents those that have not churned from Databel, while the right 4 bars represents those that have churned. Following that group, it comes whether the customer has subscribed to the international plan, with the left 2 bars in the group representing those that have not subscribed to the international plan, while the right 2 bars in the group representing those that have subscribed to the plan. In that subgroup, the left bar will present those that have been not internationally active, while those on the right represents those that can been internationally active, meaning they have been using international calls irregardless of whether they are subscribed to the international plan.
In this case, it is quite apparantly there is quite a number that no international plan, but yet chose to do international calls without the plan. Whether or not it is international, this does shed some light on the why people would churn. Of most that churned, they were not in the international plan, and yet a majority of them went interntionally active. Perhaps in this case, the international plan itself has to be refined to be more enticing and more description of it's description to prevent people from getting surcharges from unnecessary international calls.
To look at the surcharges made, we can include the averages of the extra international charges that were incurred.
From this image itself, it was reasoable to see that of those that are not in the international plan. they were charged with calls that were made internationally. Perhaps this could be one of the main reasons as to why people have churned, and perhaps why there was a lot of dissasfaction with the attitude of the support centers. However, that theory can be debunked by comparing the churning reasons with the extra charges.
While it is up to interpretation, this graph can be discussed on the main problem with the surcharges. However, one point is confirmed, is there is a number that are dissatisfied international plan.
Customers Service Calls
As noted before, one of the main reasons that people churned was the amount of dissatisfaction with the customer service calls. This is indeed true when we look at the statistics of the calls made.
Averagely, each person who churned needed to make at least 2 calls per account length, but this is lesser when considering those that didn't churned, which only limits to roughly 0.5 calls per person max.
It is interesting to note once sum is included in this equation, where the bulk of the calls were only made during the creation of new accounts, which could mean that many customers were having problems with the product at the initial stage, especially for those who have churned, and more so those that are internationally active.
Monthly Charges and Subscription
One point that was observed was the average prices of the monthly charged when comparing the two groups together. For those who have churned, their monthly charges was higher than than those who have not churned. With the monthly charges being huge, subscription was the first to look at. The monthly charges could be due to the international charges that were paid before, but it did not explain the charges not dipping after the first few months of the account length, but instead showed a slight increase.
One is to look at the contract type that customers had.
The contract type average monthly did not show much fluctuation between the groups. Hence, contract type is not the answer. However, there was one factor that might be factored in, and that is the group. It is indicated in the schema that a group contract generally offers advantages and is generally cheaper.
This does show an interesting trend, whereby in the case of non-grouping and grouping, there seems to be a generally higher average in non-groups. However, that does not explain why churned groups still face a higher average of monthly charges. What if the payment method had a weird trend?
Interesting enough, the payment method does have difference in charges. It does not show why there was still an increment of charges, but considering that there was a difference in debit payment, credit payment and physical check, there seems to be a priority to also look into such cases on why they are different.
The culprit for this is one of the previous stated before. It all comes down to whether they were internationally active or not, despite the plan.
What wasn't the issue
We checked with the unlimited data plan, and the average monthly GB download along with the extra data charges, and that did not signal any problems. Generally, it was quite equal on both sides, indicating there was not a huge problem, as both customers who churned and those that did not churn has the same trend.
Are there differences in churn rate between customers that consume less than 3GB of data and more than that?
To first do this, we must first allow the data to create a filter that can efficiently filter to do a comparison. For this, we add an additional column, where it will do a conditional statement, where the condition checks whether the data is more or less than 3GB
We then use a pivot to collate it like in the first question, as well as calculate the percentages.
This is the result
The first row of percentages shows when their respective groups on whether they are more than 3GB or not. To compare between the two groups, it seems that there are more people that churned when there is more than 3GB, compared to those that aren't more than 3GB by roughly 9%. When the 2nd row comes in, it becomes more obvious as there is difference in them. While we are not doing this in this case, it is wise that we perform a t test to test whether it is signficantly different, but however, this project will end off here, and offer that explanation in a much later date.
For now, the conclusionis that there is a difference between the 3GB difference, as more people are churning if they are using more than 3GB.