Abstract
Customer churn metric represents one of the key performance indicators for most present-day companies and is a significant business concern. The loss of customers directly impacts the margins and profitability of organizations. That's why analyzing customer churn is extremely important.
In the following notebook, we will work with a database of customers from a bank who have churned from the organization. From these churned customers, we will learn what patterns they have in common for leaving the business. Once we understand these patterns, we will work on a machine learning model to predict if current customers are at risk of churning from the company, allowing us to take measures to prevent it.
Furthermore, we will understand what factors influence churn and can take steps to improve service quality and retention of current and future customers.
1. Objective
To understand the characteristics of customers who have churned from the bank and reduce churn in current and future customers.
2. Business Context
Reducing the churn rate will directly impact the profitability of the business. On the other hand, it will improve the image for future and current customers, which will help in retention and reducing the churn rate.
3. Business Problem (Hypotheses)
- If a customer has complained about the service, the likelihood of churn increases.
- Customers in Germany are more likely to churn from the bank.
- Customers aged 35 to 55 tend to churn from the bank more frequently.
- Customers with only one bank product tend to churn from the bank more than those who do not.
- If the customer is not an active member, they tend to churn from the company.
- Depending on the customer's tenure, there is more or less churn.
4. Analytical Context
Variables:
- RowNumber: Row index number.
- CustomerID: Unique customer identifier.
- Surname: Customer's surname.
- CreditScore: Customer's credit score ranging from 350 to 850 points.
- Geography: Customer's country.
- Gender: Female or Male gender.
- Tenure: Time the customer has been with the bank.
- Balance: Account balance.
- Num Of Products: Number of products contracted by each customer.
- HasCrCard: 1 if they have a credit card, 0 if they don't.
- IsActiveMember: 1 if the member is active, 0 if inactive.
- EstimatedSalary: Customer's estimated annual salary.
- Exited: 1 if they have churned from the bank, 0 if they are still a customer.
- Complain: 1 if they have complained, 0 if they haven't.
- Satisfaction Score: Numeric value of customer satisfaction.
- Card Type: Diamond, Gold, Silver card type.
- Points Earned: Points earned within the bank.
5. Explore Analysis and Data Transformation
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
customers = pd.read_csv('Customer-Churn-Records.csv')
customers.info()
customers
customers[['CreditScore', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'EstimatedSalary', 'Satisfaction Score', 'Point Earned']].describe()
customer_churn[["Exited", "Complain", "IsActiveMember", "HasCrCard"]] = customer_churn[["Exited", "Complain", "IsActiveMember", "HasCrCard"]].replace({ 1: "Yes", 0:"No"})
print(customer_churn.head())
customers['Exited'].value_counts(normalize=True)
a. Exited Clients
There's 20% of current clients that have exited, this doesn't alling withe industry average.
ax = customers['Exited'].value_counts(normalize=True).plot(kind='barh', color=['blue', 'red'], edgecolor='black', figsize=(6,4))
plt.title('Proportion of Customers Exiting')
plt.xlabel('Exited')
plt.ylabel('Proportion')
for p in ax.patches:
ax.annotate(f'{p.get_width():.2%}', (p.get_x() + p.get_width(), p.get_y() + p.get_height() / 2.), ha='left', va='center', xytext=(5, 0), textcoords='offset points')
plt.show()
customers_still = customers[customers['Exited']==0]
churned = customers[customers['Exited']==1]
b. Geography Observation
We can notice that the number of churned clients in Germany, in proportion, is higher than those in Spain and France. In numbers:
- France 16.17%
- Germany 32.44%
- Spain 16.67%
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
geo = customers['Geography'].value_counts()
ax1.bar(geo.index, geo.values)
ax1.set_title('Country Distribution')
ax1.set_xlabel('Countries')
ax1.set_ylabel('# of Clients')
ax1.grid(True)
geo_churn = churn['Geography'].value_counts()
ax2.bar(geo_churn.index, geo_churn.values, color='red')
ax2.set_title('Country Churned Distribution')
ax2.set_xlabel('Countries')
ax2.set_ylabel('# of Clients')
ax2.grid(True)
plt.tight_layout()
plt.show()
exited_c = geo_churn / geo
exited_c
c. Demographic Analysis
The charts below show higher droput rate for women than men.
- Female 25.07%
- Male 16.47%