Skip to content

Telecom Customer Churn Analysis

A real-world project to predict churn and uncover customer behavior insights
using a dataset from an Iranian telecom company.

Tools Used: Python, Pandas, Matplotlib, Seaborn, Scikit-learn
Objective: Use data exploration and machine learning to understand customer behavior and predict churn likelihood.

This dataset is derived from a telecom company operating in Iran. It contains real customer-level information including demographics, call usage, SMS frequency, and plan details.
Each row represents one customer.

🔧 Project Workflow

This notebook is structured as follows:

  1. Data Preview & Understanding
  2. Exploratory Data Analysis
  3. Statistical Testing
  4. Churn Prediction using Machine Learning
  5. Model Evaluation & Insights

The goal is to translate behavioral data into actionable churn predictions.

1️⃣ Load Dataset

We begin by loading the customer churn dataset into memory and performing an initial structural inspection to understand the shape and completeness of the data.

import pandas as pd
churn = pd.read_csv("data/customer_churn.csv")
print(churn.shape)
churn.head(100)

Data Dictionary

ColumnExplanation
Call Failurenumber of call failures
Complaintsbinary (0: No complaint, 1: complaint)
Subscription Lengthtotal months of subscription
Charge Amountordinal attribute (0: lowest amount, 9: highest amount)
Seconds of Usetotal seconds of calls
Frequency of usetotal number of calls
Frequency of SMStotal number of text messages
Distinct Called Numberstotal number of distinct phone calls
Age Groupordinal attribute (1: younger age, 5: older age)
Tariff Planbinary (1: Pay as you go, 2: contractual)
Statusbinary (1: active, 2: non-active)
Ageage of customer
Customer Valuethe calculated value of customer
Churnclass label (1: churn, 0: non-churn)

1 hidden cell

2️⃣ Exploratory Data Analysis (EDA)

EDA is conducted to find trends or behaviors linked to churn.

We start by analyzing whether customer preferences for SMS versus calls vary by age group — as different communication styles may correlate with churn.

usage_by_age=churn.groupby("Age Group")[["Frequency of SMS","Frequency of use"]].sum()
print(usage_by_age)
sms_more=usage_by_age[usage_by_age["Frequency of SMS"]>usage_by_age["Frequency of use"]]
print(sms_more)
sms_more_age_group=sms_more.index.tolist()
print(sms_more_age_group)

💡 Insight

We observe that Age Groups 2 and 3 tend to send more SMS than make phone calls.
This suggests a preference for asynchronous communication in younger demographics — a behavior telecoms might leverage for targeted messaging plans.

Next, we categorize users by how long they stay on calls. We then examine how call durations and contact variety vary by age.

This helps determine if different age segments use voice services differently — useful for plan design and churn risk analysis.

print(churn['Seconds of Use'].describe())

def call_length_category(seconds):
    if seconds < 1391:
        return 'Short'
    elif seconds < 2990:
        return 'Medium'
    else:
        return 'Long'

churn['Call Length Category']=churn['Seconds of Use'].apply(call_length_category)
print(churn.head(6))
grouped=churn.groupby(["Age Group","Call Length Category"])["Distinct Called Numbers"].sum().reset_index()
print(grouped)
import seaborn as sns
import matplotlib.pyplot as plt
sns.barplot(data=grouped, x='Age Group', y='Distinct Called Numbers', hue='Call Length Category',hue_order=['Short', 'Medium', 'Long'])
plt.show()