Telecom Customer Churn Analysis
A real-world project to predict churn and uncover customer behavior insights
using a dataset from an Iranian telecom company.
Tools Used: Python, Pandas, Matplotlib, Seaborn, Scikit-learn
Objective: Use data exploration and machine learning to understand customer behavior and predict churn likelihood.
This dataset is derived from a telecom company operating in Iran. It contains real customer-level information including demographics, call usage, SMS frequency, and plan details.
Each row represents one customer.
🔧 Project Workflow
This notebook is structured as follows:
- Data Preview & Understanding
- Exploratory Data Analysis
- Statistical Testing
- Churn Prediction using Machine Learning
- Model Evaluation & Insights
The goal is to translate behavioral data into actionable churn predictions.
1️⃣ Load Dataset
We begin by loading the customer churn dataset into memory and performing an initial structural inspection to understand the shape and completeness of the data.
import pandas as pd
churn = pd.read_csv("data/customer_churn.csv")
print(churn.shape)
churn.head(100)Data Dictionary
| Column | Explanation |
|---|---|
| Call Failure | number of call failures |
| Complaints | binary (0: No complaint, 1: complaint) |
| Subscription Length | total months of subscription |
| Charge Amount | ordinal attribute (0: lowest amount, 9: highest amount) |
| Seconds of Use | total seconds of calls |
| Frequency of use | total number of calls |
| Frequency of SMS | total number of text messages |
| Distinct Called Numbers | total number of distinct phone calls |
| Age Group | ordinal attribute (1: younger age, 5: older age) |
| Tariff Plan | binary (1: Pay as you go, 2: contractual) |
| Status | binary (1: active, 2: non-active) |
| Age | age of customer |
| Customer Value | the calculated value of customer |
| Churn | class label (1: churn, 0: non-churn) |
1 hidden cell
2️⃣ Exploratory Data Analysis (EDA)
EDA is conducted to find trends or behaviors linked to churn.
We start by analyzing whether customer preferences for SMS versus calls vary by age group — as different communication styles may correlate with churn.
usage_by_age=churn.groupby("Age Group")[["Frequency of SMS","Frequency of use"]].sum()
print(usage_by_age)sms_more=usage_by_age[usage_by_age["Frequency of SMS"]>usage_by_age["Frequency of use"]]
print(sms_more)sms_more_age_group=sms_more.index.tolist()
print(sms_more_age_group)💡 Insight
We observe that Age Groups 2 and 3 tend to send more SMS than make phone calls.
This suggests a preference for asynchronous communication in younger demographics — a behavior telecoms might leverage for targeted messaging plans.
Next, we categorize users by how long they stay on calls. We then examine how call durations and contact variety vary by age.
This helps determine if different age segments use voice services differently — useful for plan design and churn risk analysis.
print(churn['Seconds of Use'].describe())
def call_length_category(seconds):
if seconds < 1391:
return 'Short'
elif seconds < 2990:
return 'Medium'
else:
return 'Long'
churn['Call Length Category']=churn['Seconds of Use'].apply(call_length_category)
print(churn.head(6))grouped=churn.groupby(["Age Group","Call Length Category"])["Distinct Called Numbers"].sum().reset_index()
print(grouped)import seaborn as sns
import matplotlib.pyplot as plt
sns.barplot(data=grouped, x='Age Group', y='Distinct Called Numbers', hue='Call Length Category',hue_order=['Short', 'Medium', 'Long'])
plt.show()