Skip to content

Bank Marketing

This dataset consists of direct marketing campaigns by a Portuguese banking institution using phone calls. The campaigns aimed to sell subscriptions to a bank term deposit (see variable y).

Not sure where to begin? Scroll to the bottom to find challenges!

Data Dictionary

ColumnVariableClass
ageage of customer
jobtype of jobcategorical: "admin.","blue-collar","entrepreneur","housemaid","management","retired","self-employed","services","student","technician","unemployed","unknown"
maritalmarital statuscategorical: "divorced","married","single","unknown"; note: "divorced" means divorced or widowed
educationhighest degree of customercategorical: "basic.4y","basic.6y","basic.9y","high.school","illiterate","professional.course","university.degree","unknown"
defaulthas credit in default?categorical: "no","yes","unknown"
housinghas housing loan?categorical: "no","yes","unknown"
loanhas personal loan?categorical: "no","yes","unknown"
contactcontact communication typecategorical: "cellular","telephone"
monthlast contact month of yearcategorical: "jan", "feb", "mar", ..., "nov", "dec"
day_of_weeklast contact day of the weekcategorical: "mon","tue","wed","thu","fri"
campaignnumber of contacts performed during this campaign and for this clientnumeric, includes last contact
pdaysnumber of days that passed by after the client was last contacted from a previous campaignnumeric; 999 means client was not previously contacted
previousnumber of contacts performed before this campaign and for this clientnumeric
poutcomeoutcome of the previous marketing campaigncategorical: "failure","nonexistent","success"
emp.var.rateemployment variation rate - quarterly indicatornumeric
cons.price.idxconsumer price index - monthly indicatornumeric
cons.conf.idxconsumer confidence index - monthly indicatornumeric
euribor3meuribor 3 month rate - daily indicatornumeric
nr.employednumber of employees - quarterly indicatornumeric
yhas the client subscribed a term deposit?binary: "yes","no"

Source of dataset.

Citations:

  • S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014
  • S. Moro, R. Laureano and P. Cortez. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology. In P. Novais et al. (Eds.), Proceedings of the European Simulation and Modelling Conference - ESM'2011, pp. 117-121, Guimaraes, Portugal, October, 2011. EUROSIS.

EDA

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from pandas.plotting import scatter_matrix

bank = pd.read_csv("bank-marketing.csv", sep=";")
bank.head()
print(bank.shape)
print(bank.info())
print(bank.describe())
print(bank['y'].value_counts())

Data visualization and analysis

sns.countplot(x='y', data=bank)
plt.xlabel = ('Subscribed to term deposit')
plt.ylabel = ('Count')
plt.title = ('Distribution of Subscriptions')
plt.show()
# Explore the relationship between age and subscription
sns.boxplot(x='y', y='age', data=bank)
plt.xlabel('Subscribed to term deposit')
plt.ylabel('Age')
plt.title('Age Distribution for Subscriptions')
plt.show()
job_subs = bank[bank['y'] == 'yes']['job'].value_counts()
job_subs.plot(kind='bar')
plt.xlabel('Job')
plt.ylabel('Count')
plt.title('Distribution of Jobs among Subscribers')
plt.show()
# Analyze the subscription rate by education level
edu_subs = bank.groupby('education')['y'].value_counts(normalize=True).unstack()
edu_subs.plot(kind='bar', stacked=True)
plt.xlabel('Education Level')
plt.ylabel('Proportion')
plt.title('Subscription Rate by Education Level')
plt.legend(title='Subscribed to term deposit')
plt.show()
bank.hist(bins=10, figsize=(10, 15));
Run cancelled
bank.columns
Run cancelled
bank.isnull().sum()