Skip to content

Telecom Customer Churn

This dataset comes from an Iranian telecom company, with each row representing a customer over a year period. Along with a churn label, there is information on the customers' activity, such as call failures and subscription length.

Not sure where to begin? Scroll to the bottom to find challenges!

import pandas as pd
churn = pd.read_csv("data/customer_churn.csv")
print(churn.shape)
churn.head(100)

Data Dictionary

ColumnExplanation
Call Failurenumber of call failures
Complaintsbinary (0: No complaint, 1: complaint)
Subscription Lengthtotal months of subscription
Charge Amountordinal attribute (0: lowest amount, 9: highest amount)
Seconds of Usetotal seconds of calls
Frequency of usetotal number of calls
Frequency of SMStotal number of text messages
Distinct Called Numberstotal number of distinct phone calls
Age Groupordinal attribute (1: younger age, 5: older age)
Tariff Planbinary (1: Pay as you go, 2: contractual)
Statusbinary (1: active, 2: non-active)
Ageage of customer
Customer Valuethe calculated value of customer
Churnclass label (1: churn, 0: non-churn)

Source of dataset and source of dataset description.

Citation: Jafari-Marandi, R., Denton, J., Idris, A., Smith, B. K., & Keramati, A. (2020). Optimum Profit-Driven Churn Decision Making: Innovative Artificial Neural Networks in Telecom Industry. Neural Computing and Applications.

Don't know where to start?

Challenges are brief tasks designed to help you practice specific skills:

  • 🗺️ Explore: Which age groups send more SMS messages than make phone calls?
  • 📊 Visualize: Create a plot visualizing the number of distinct phone calls by age group. Within the chart, differentiate between short, medium, and long calls (by the number of seconds).
  • 🔎 Analyze: Are there significant differences between the length of phone calls between different tariff plans?

Scenarios are broader questions to help you develop an end-to-end project for your portfolio:

You have just been hired by a telecom company. A competitor has recently entered the market and is offering an attractive plan to new customers. The telecom company is worried that this competitor may start attracting its customers.

You have access to a dataset of the company's customers, including whether customers churned. The telecom company wants to know whether you can use this data to predict whether a customer will churn. They also want to know what factors increase the probability that a customer churns.

You will need to prepare a report that is accessible to a broad audience. It should outline your motivation, steps, findings, and conclusions.

import pandas as pd
churn = pd.read_csv("data/customer_churn.csv")
age_churn = churn.iloc[:,9:]

churn_pos = age_churn.query(f"Churn == @True")
churn_neg = age_churn.query(f"Churn == @False")
churn

What age do those who churn tend to have? (average) What status are they in?

What's the ratio of Complaints:Churn Is there a correlation between Churn and a certain tariff/status? Or Complaints and a certain tariff/status?

Current Type: Bar
Current X-axis: Churn
Current Y-axis: Age
Current Color: Age

Age of those who Churn

Current Type: Bar
Current X-axis: Status
Current Y-axis: Age
Current Color: Age

Stati of Ages

Current Type: Bar
Current X-axis: Subscription Length
Current Y-axis: Churn
Current Color: Complaints

Churn rate per Subscription length + Amount of Complaints

Current Type: Bar
Current X-axis: Age
Current Y-axis: Customer Value
Current Color: Age

Age and Customer Value of churned customers

Current Type: Bar
Current X-axis: Age
Current Y-axis: Customer Value
Current Color: Age

Age and Customer Value of current customers

What's the churn rate? -> (lost customers/total cust)*100

What's the highest/lowest Customer Value of churned customers?

What do the complaints refer to?

import pandas as pd
import numpy as np
churn = pd.read_csv("data/customer_churn.csv")

churned = age_churn.query(f"Churn == @True")
churnedc = len(churned.Churn)
total = len(churn.Churn)

churn_rate = (churnedc/total)*100
churn_rate = round(churn_rate, 2)
print("Churn rate: ",churn_rate,"%")

idx1 = churn.iloc[:,-2].max(axis=0)
idx2 = churn.iloc[:,-2].min(axis=0)

max_cust_val = churn[churn["Customer Value"]==idx1]
min_cust_val = churn[churn["Customer Value"]==idx2]
print("Max. Customer Value: ",max_cust_val,"| Min. Customer Value: ",min_cust_val)

frames = [min_cust_val, max_cust_val]
minmax_custval = pd.concat(frames)
minmax_custval = minmax_custval.set_index("Customer Value")
minmax_custval