Skip to content
Telecom Churn Analysis with Random Forest
Telecom Customer Churn - Introduction
This dataset comes from an Iranian telecom company, with each row representing a customer over a year period. Along with a churn label, there is information on the customers' activity, such as call failures and subscription length.
Not sure where to begin? Scroll to the bottom to find challenges!
import pandas as pd
pd.read_csv("data/customer_churn.csv")Data Dictionary
| Column | Explanation |
|---|---|
| Call Failure | number of call failures |
| Complains | binary (0: No complaint, 1: complaint) |
| Subscription Length | total months of subscription |
| Charge Amount | ordinal attribute (0: lowest amount, 9: highest amount) |
| Seconds of Use | total seconds of calls |
| Frequency of use | total number of calls |
| Frequency of SMS | total number of text messages |
| Distinct Called Numbers | total number of distinct phone calls |
| Age Group | ordinal attribute (1: younger age, 5: older age) |
| Tariff Plan | binary (1: Pay as you go, 2: contractual) |
| Status | binary (1: active, 2: non-active) |
| Age | age of customer |
| Customer Value | the calculated value of customer |
| Churn | class label (1: churn, 0: non-churn) |
Object and Motivation
- Im going to propose a model that predicts Customer Churn Better than a Baseline Classifier
- Im going to undeline the most important causes related to customer Churn
- In chapter 3 I will explore the data, showing appropriated graphs and making observations about the findings
- In chapter 4 I will design the model and show its performance and the most important variables in churn customers
- Finally in chapter 5 i will make conclusions and recomendations
Exploratory Data Analysis
## Basic Importsimport pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
Datafrane definition
df=pd.read_csv("data/customer_churn.csv")df.shapeStatistical description
df.describe()Null Cells