Telecom Customer Churn Analysis

Telecom Customer Churn Prediction

Business Problem: Telecom companies lose revenue when customers churn. We need a model that identifies high-risk customers early for targeted retention actions.

Tools Used: Python, Pandas, Matplotlib, Seaborn, Scikit-learn

Success Metrics:

Reduce churn rate
Improve retention campaign ROI
Prioritize high-value customers

This dataset is derived from a telecom company operating in Iran. It contains real customer-level information including demographics, call usage, SMS frequency, and plan details.
Each row represents one customer.

🔧 Project Workflow

This notebook is structured as follows:

Data Preview & Understanding
Exploratory Data Analysis
Statistical Testing
Churn Prediction using Machine Learning
Model Evaluation & Insights

The goal is to translate behavioral data into actionable churn predictions.

1️⃣ Load Dataset

We begin by loading the customer churn dataset into memory and performing an initial structural inspection to understand the shape and completeness of the data.

import pandas as pd
churn = pd.read_csv("data/customer_churn.csv")
print(churn.shape)
churn.head(100)

Data Dictionary

Column	Explanation
Call Failure	number of call failures
Complaints	binary (0: No complaint, 1: complaint)
Subscription Length	total months of subscription
Charge Amount	ordinal attribute (0: lowest amount, 9: highest amount)
Seconds of Use	total seconds of calls
Frequency of use	total number of calls
Frequency of SMS	total number of text messages
Distinct Called Numbers	total number of distinct phone calls
Age Group	ordinal attribute (1: younger age, 5: older age)
Tariff Plan	binary (1: Pay as you go, 2: contractual)
Status	binary (1: active, 2: non-active)
Age	age of customer
Customer Value	the calculated value of customer
Churn	class label (1: churn, 0: non-churn)

churn.info()

churn.describe()

churn.isna().sum()

churn['Churn'].value_counts()

churn['Churn'].value_counts(normalize=True) * 100

sns.countplot(x='Churn', data=churn)
plt.title("Churn Class Distribution")
plt.show()

Before modelling, I always check data completeness, data types, and class balance. Here, we have no missing values, but churn is only ~16%, so it’s slightly imbalanced. That’s why I focus on precision, recall, and F1 rather than just accuracy.

2️⃣ Exploratory Data Analysis (EDA)

EDA is conducted to find trends or behaviors linked to churn.

We start by analyzing whether customer preferences for SMS versus calls vary by age group — as different communication styles may correlate with churn.

usage_by_age=churn.groupby("Age Group")[["Frequency of SMS","Frequency of use"]].mean()
print(usage_by_age)

To avoid bias from different group sizes, I look at average usage per customer rather than totals. This gives a fairer comparison of behaviour across age groups