Skip to content

Telecom Customer Churn Prediction

Business Problem: Telecom companies lose revenue when customers churn. We need a model that identifies high-risk customers early for targeted retention actions.

Tools Used: Python, Pandas, Matplotlib, Seaborn, Scikit-learn

Success Metrics:

  1. Reduce churn rate
  2. Improve retention campaign ROI
  3. Prioritize high-value customers

This dataset is derived from a telecom company operating in Iran. It contains real customer-level information including demographics, call usage, SMS frequency, and plan details.
Each row represents one customer.

🔧 Project Workflow

This notebook is structured as follows:

  1. Data Preview & Understanding
  2. Exploratory Data Analysis
  3. Statistical Testing
  4. Churn Prediction using Machine Learning
  5. Model Evaluation & Insights

The goal is to translate behavioral data into actionable churn predictions.

1️⃣ Load Dataset

We begin by loading the customer churn dataset into memory and performing an initial structural inspection to understand the shape and completeness of the data.

import pandas as pd
churn = pd.read_csv("data/customer_churn.csv")
print(churn.shape)
churn.head(100)

Data Dictionary

ColumnExplanation
Call Failurenumber of call failures
Complaintsbinary (0: No complaint, 1: complaint)
Subscription Lengthtotal months of subscription
Charge Amountordinal attribute (0: lowest amount, 9: highest amount)
Seconds of Usetotal seconds of calls
Frequency of usetotal number of calls
Frequency of SMStotal number of text messages
Distinct Called Numberstotal number of distinct phone calls
Age Groupordinal attribute (1: younger age, 5: older age)
Tariff Planbinary (1: Pay as you go, 2: contractual)
Statusbinary (1: active, 2: non-active)
Ageage of customer
Customer Valuethe calculated value of customer
Churnclass label (1: churn, 0: non-churn)
churn.info()
churn.describe()
churn.isna().sum()
churn['Churn'].value_counts()
churn['Churn'].value_counts(normalize=True) * 100
sns.countplot(x='Churn', data=churn)
plt.title("Churn Class Distribution")
plt.show()

Before modelling, I always check data completeness, data types, and class balance. Here, we have no missing values, but churn is only ~16%, so it’s slightly imbalanced. That’s why I focus on precision, recall, and F1 rather than just accuracy.

2️⃣ Exploratory Data Analysis (EDA)

EDA is conducted to find trends or behaviors linked to churn.

We start by analyzing whether customer preferences for SMS versus calls vary by age group — as different communication styles may correlate with churn.

usage_by_age=churn.groupby("Age Group")[["Frequency of SMS","Frequency of use"]].mean()
print(usage_by_age)

To avoid bias from different group sizes, I look at average usage per customer rather than totals. This gives a fairer comparison of behaviour across age groups