Account Churn Classification Model Shootout

Churn Classification Model Shootout

This dataset contains account information for phone plan customers. There is a column attached that indicates whether the customer churned. Using the data compiled for each account, can we build a predicitive model that tells us whether the customer will churn or not? Below is some data cleaning, EDA, and model testing!

# Import Packages and Modules
%matplotlib inline 
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
#From Scikit Learn
from sklearn import preprocessing
from sklearn.model_selection  import train_test_split, cross_val_score, KFold
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.metrics import roc_curve, auc, confusion_matrix, classification_report

Read in Churn Calls dataset as a dataframe called Churn

#Import csv into Pandas Dataset called Auto
Churn = pd.read_csv('Churn_Calls.csv', sep = ",")

Churn.head()

Churn.dtypes

Hidden output

Set target variable to Churn and move to first column.

# designate target variable name
targetName = 'churn'
#print(targetName)
targetSeries = Churn[targetName]
#remove target from current location and insert in column number 0
del Churn[targetName]
Churn.insert(0, targetName, targetSeries)
#reprint dataframe and see target is in position 0
Churn.head(10)

#Check for NaN values
Churn.isna().any()

Hidden output

Exploratory Data Analysis

#Create a bar chart of our target variable
groupby = Churn.groupby(targetName)
targetEDA=groupby[targetName].aggregate(len)
print(targetEDA)
plt.figure()
targetEDA.plot(kind='bar', grid=False)
plt.axhline(0, color='k')

#Describe the database
Churn.describe()

#Check out the variable correlation

#Create correlation matrix
corr_matrix = Churn.iloc[:,1:].corr()
corr_matrix

plt.figure(figsize=(15,15)) #need to adjust size as needed.
mask = np.zeros_like(corr_matrix, dtype=np.bool)
mask[np.triu_indices_from(mask)] = True

sns.heatmap(corr_matrix,
           vmin=-1,
           vmax=1,
           cmap='coolwarm',
           annot=True,
           mask=mask)
plt.show()

We can see that roughly 14% of the customers (707/5000) in our database have churned. Seeing as this is such a low number, our model is going to need to be very precise to catch which customers could be a churn risk. Since 84% of cour customers don't churn, we are going to need to see predictive performance greater than 84% for this model to be useful.

We can also see by the correlation matrix that there is not much corraltion between most of the variables, but a handful are heavily corrlated to each other. For example Total Daily Charge and Total Daily Min are highly correlated, which makes sense. Same with Total Night Charge and Total Night Min. These may act like duplicate variables.

‌
‌
‌

Account Churn Classification Model Shootout

.mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Churn Classification Model Shootout

Read in Churn Calls dataset as a dataframe called Churn

Set target variable to Churn and move to first column.

Exploratory Data Analysis

Churn Classification Model Shootout