Skip to content

πŸͺ™ Bank Marketing

This dataset consists of direct marketing campaigns by a Portuguese banking institution using phone calls. The campaigns aimed to sell subscriptions to a bank term deposit (see variable y).

import pandas as pd
bm = pd.read_csv("bank-marketing.csv", sep=";")
bm.head(4000)

Data Dictionary

ColumnVariableClass
ageage of customer
jobtype of jobcategorical: "admin.","blue-collar","entrepreneur","housemaid","management","retired","self-employed","services","student","technician","unemployed","unknown"
maritalmarital statuscategorical: "divorced","married","single","unknown"; note: "divorced" means divorced or widowed
educationhighest degree of customercategorical: "basic.4y","basic.6y","basic.9y","high.school","illiterate","professional.course","university.degree","unknown"
defaulthas credit in default?categorical: "no","yes","unknown"
housinghas housing loan?categorical: "no","yes","unknown"
loanhas personal loan?categorical: "no","yes","unknown"
contactcontact communication typecategorical: "cellular","telephone"
monthlast contact month of yearcategorical: "jan", "feb", "mar", ..., "nov", "dec"
day_of_weeklast contact day of the weekcategorical: "mon","tue","wed","thu","fri"
campaignnumber of contacts performed during this campaign and for this clientnumeric, includes last contact
pdaysnumber of days that passed by after the client was last contacted from a previous campaignnumeric; 999 means client was not previously contacted
previousnumber of contacts performed before this campaign and for this clientnumeric
poutcomeoutcome of the previous marketing campaigncategorical: "failure","nonexistent","success"
emp.var.rateemployment variation rate - quarterly indicatornumeric
cons.price.idxconsumer price index - monthly indicatornumeric
cons.conf.idxconsumer confidence index - monthly indicatornumeric
euribor3meuribor 3 month rate - daily indicatornumeric
nr.employednumber of employees - quarterly indicatornumeric
yhas the client subscribed a term deposit?binary: "yes","no"

Source of dataset.

Citations:

  • S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014
  • S. Moro, R. Laureano and P. Cortez. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology. In P. Novais et al. (Eds.), Proceedings of the European Simulation and Modelling Conference - ESM'2011, pp. 117-121, Guimaraes, Portugal, October, 2011. EUROSIS.

1 hidden cell

πŸ—ΊοΈ What are the jobs of the people most likely to subscribe to a term deposit?

Over 50% of the term deposit subscribers were administrators, technicians, or blue-collar workers.

import matplotlib.pyplot as plt
import pandas as pd

# Assuming bm is already defined as a DataFrame
# Filter the dataframe to include only those who subscribed to a term deposit
subscribers = bm[bm['y'] == 'yes']

# Count the number of subscribers for each job
job_counts = subscribers['job'].value_counts()

# Keep the top 7 jobs and label the rest as "other"
top_jobs = job_counts.nlargest(7)
other_jobs_count = job_counts[7:].sum()
job_counts_top7 = pd.concat([top_jobs, pd.Series(other_jobs_count, index=['other'])])

# Plot a pie chart
plt.figure(figsize=(10, 8))
plt.pie(job_counts_top7, labels=job_counts_top7.index, autopct='%1.1f%%', startangle=140)
plt.title('Job Distribution of Term Deposit Subscribers', pad=30)  # Increase the padding between the title and the chart
plt.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
plt.show()

πŸ“Š Visualizing subscribers to a term deposit by month.

The highest number of subscribers subscribed between April and August.

# Ensure the 'month' column is ordered correctly
month_order = ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep', 'oct', 'nov', 'dec']
subscribers['month'] = pd.Categorical(subscribers['month'], categories=month_order, ordered=True)

# Get the count of subscribers by month
month_counts = subscribers['month'].value_counts().sort_index()

# Create a bar chart
plt.figure(figsize=(10, 6))
bars = plt.bar(month_counts.index, month_counts, color='skyblue')

# Add counts on top of the bars
for bar in bars:
    yval = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2, yval, int(yval), ha='center', va='bottom')

# Add labels and title
plt.xlabel('Month')
plt.ylabel('Number of Subscribers')
plt.title('Number of Subscribers by Month')
plt.xticks(rotation=45)
plt.show()
# Calculate the total contacting frequency (campaign) by month
total_contact_freq_by_month = subscribers.groupby('month')['campaign'].sum()

# Create a bar chart to visualize the total contacting frequency by month
plt.figure(figsize=(10, 6))
bars = plt.bar(total_contact_freq_by_month.index, total_contact_freq_by_month, color='lightcoral')

# Add total values on top of the bars
for bar in bars:
    yval = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2, yval, int(yval), ha='center', va='bottom')

# Add labels and title
plt.xlabel('Month')
plt.ylabel('Total Contacting Frequency')
plt.title('Total Contacting Frequency by Month')
plt.xticks(rotation=45)
plt.show()

πŸ”Ž What impact does the number of contacts performed during the last campaign have on the likelihood that a customer subscribes to a term deposit?

πŸ‘‰ APPROACH: Created a column (subscribed) with a binary value of 1 or 0, then grouped by the number of contacts (campaign) and the mean subscription rate was calculated for each group. Rows where the subscription rate was 0 were filtered out.

πŸ‘‰ KEY FINDINGS:

  • Low number of contacts resulted in a higher subscription rate.
  • The higher the number of contacts, the less likely they will susbcribe.
import seaborn as sns

# Create a new column 'subscribed' which is 1 if 'y' is 'yes' and 0 otherwise
bm['subscribed'] = bm['y'].apply(lambda x: 1 if x == 'yes' else 0)

# Group by the number of contacts performed during the last campaign and calculate the mean subscription rate
campaign_subscription_rate = bm.groupby('campaign')['subscribed'].mean().reset_index()

# Create a bar plot to visualize the impact of the number of contacts on subscription rate
plt.figure(figsize=(12, 6))
sns.barplot(x='campaign', y='subscribed', data=campaign_subscription_rate, palette='viridis')

# Draw a line to show the tendency
sns.lineplot(x='campaign', y='subscribed', data=campaign_subscription_rate, color='red', marker='o')

# Add labels and title
plt.xlabel('Number of Contacts During Last Campaign')
plt.ylabel('Subscription Rate')
plt.title('Impact of Number of Contacts During Last Campaign on Subscription Rate')

# Adjust the x-axis to make the values more readable by increasing the distance between the X values
plt.xticks(ticks=range(0, campaign_subscription_rate['campaign'].max() + 1, 2), rotation=45)

plt.show()
import matplotlib.pyplot as plt
import seaborn as sns

# Plot the count of different values in the 'y' column
plt.figure(figsize=(10, 6))
ax = sns.countplot(data=bm, x='y')
plt.title('Distribution of Subscription Status (Yes/No) in the Marketing Campaign Dataset', pad=20)
plt.xlabel('Subscribed')
plt.ylabel('Count')

# Add the count of each category in the middle of the bar
for p in ax.patches:
    ax.annotate(f'{int(p.get_height()):,}', (p.get_x() + p.get_width() / 2, p.get_height() / 2), 
                ha='center', va='center', color='white', fontsize=12, fontweight='bold')

plt.show()

Study the relationship between CPI, CCI, and the subscription rate

The CPI indicates that the economy is generally stable, but the customers are afraid of spending their money. Optimistic customers are more likely to subscribe to the bank term deposit, while pessimistic customers are not.

β€Œ
β€Œ
β€Œ