Skip to content
Bank Marketing
Bank Marketing
This dataset consists of direct marketing campaigns by a Portuguese banking institution using phone calls. The campaigns aimed to sell subscriptions to a bank term deposit (see variable y).
Not sure where to begin? Scroll to the bottom to find challenges!
Data Dictionary
| Column | Variable | Class |
|---|---|---|
| age | age of customer | |
| job | type of job | categorical: "admin.","blue-collar","entrepreneur","housemaid","management","retired","self-employed","services","student","technician","unemployed","unknown" |
| marital | marital status | categorical: "divorced","married","single","unknown"; note: "divorced" means divorced or widowed |
| education | highest degree of customer | categorical: "basic.4y","basic.6y","basic.9y","high.school","illiterate","professional.course","university.degree","unknown" |
| default | has credit in default? | categorical: "no","yes","unknown" |
| housing | has housing loan? | categorical: "no","yes","unknown" |
| loan | has personal loan? | categorical: "no","yes","unknown" |
| contact | contact communication type | categorical: "cellular","telephone" |
| month | last contact month of year | categorical: "jan", "feb", "mar", ..., "nov", "dec" |
| day_of_week | last contact day of the week | categorical: "mon","tue","wed","thu","fri" |
| campaign | number of contacts performed during this campaign and for this client | numeric, includes last contact |
| pdays | number of days that passed by after the client was last contacted from a previous campaign | numeric; 999 means client was not previously contacted |
| previous | number of contacts performed before this campaign and for this client | numeric |
| poutcome | outcome of the previous marketing campaign | categorical: "failure","nonexistent","success" |
| emp.var.rate | employment variation rate - quarterly indicator | numeric |
| cons.price.idx | consumer price index - monthly indicator | numeric |
| cons.conf.idx | consumer confidence index - monthly indicator | numeric |
| euribor3m | euribor 3 month rate - daily indicator | numeric |
| nr.employed | number of employees - quarterly indicator | numeric |
| y | has the client subscribed a term deposit? | binary: "yes","no" |
Source of dataset.
Citations:
- S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014
- S. Moro, R. Laureano and P. Cortez. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology. In P. Novais et al. (Eds.), Proceedings of the European Simulation and Modelling Conference - ESM'2011, pp. 117-121, Guimaraes, Portugal, October, 2011. EUROSIS.
EDA
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from pandas.plotting import scatter_matrix
bank = pd.read_csv("bank-marketing.csv", sep=";")
bank.head()print(bank.shape)
print(bank.info())
print(bank.describe())
print(bank['y'].value_counts())Data visualization and analysis
sns.countplot(x='y', data=bank)
plt.xlabel = ('Subscribed to term deposit')
plt.ylabel = ('Count')
plt.title = ('Distribution of Subscriptions')
plt.show()# Explore the relationship between age and subscription
sns.boxplot(x='y', y='age', data=bank)
plt.xlabel('Subscribed to term deposit')
plt.ylabel('Age')
plt.title('Age Distribution for Subscriptions')
plt.show()job_subs = bank[bank['y'] == 'yes']['job'].value_counts()
job_subs.plot(kind='bar')
plt.xlabel('Job')
plt.ylabel('Count')
plt.title('Distribution of Jobs among Subscribers')
plt.show()# Analyze the subscription rate by education level
edu_subs = bank.groupby('education')['y'].value_counts(normalize=True).unstack()
edu_subs.plot(kind='bar', stacked=True)
plt.xlabel('Education Level')
plt.ylabel('Proportion')
plt.title('Subscription Rate by Education Level')
plt.legend(title='Subscribed to term deposit')
plt.show()bank.hist(bins=10, figsize=(10, 15));Run cancelled
bank.columnsRun cancelled
bank.isnull().sum()