Skip to content

Exploratory Data Analysis with R - Bank Marketing case

Bank Marketing

This dataset consists of direct marketing campaigns by a Portuguese banking institution using phone calls. The campaigns aimed to sell subscriptions to a bank term deposit (see variable y).

Source of dataset.

Management has asked to analysis following:

  • ๐Ÿ—บ๏ธ Task 1 - Explore: What are the jobs of the people most likely to subscribe to a term deposit?
  • ๐Ÿ“Š Task 2 - Visualize: Create a plot to visualize the number of people subscribing to a term deposit by month.
  • ๐Ÿ”Ž Task 3 - Analyze: What impact does the number of contacts performed during the last campaign have on the likelihood that a customer subscribes to a term deposit?

Lastly, any additional information is welcomed and use of machine learning algorithms. The bank want to use more algorithms and management wants to know what is the best way to use them. Ideally, compare various machine learning tehcniques.

Background:

You work for a financial services firm. The past few campaigns have not gone as well as the firm would have hoped, and they are looking for ways to optimize their marketing efforts.

They have supplied you with data from a previous campaign and some additional metrics such as the consumer price index and consumer confidence index. They want to know whether you can predict the likelihood of subscribing to a term deposit. The manager would also like to know what factors are most likely to increase a customer's probability of subscribing.

You will need to prepare a report which include raw data, codes as well as used techniques. Later, the results will be cleaned for the nicer report to a broad audience.

Step 1. Import data

#Import dataset

library(tidyverse)

bank <- read_delim('data/bank-marketing.csv', delim=";", show_col_types = FALSE)

Data Dictionary

ColumnVariableClass
ageage of customer
jobtype of jobcategorical: "admin.","blue-collar", "entrepreneur", "housemaid", "management", "retired", "self-employed","services","student","technician","unemployed","unknown"
maritalmarital statuscategorical: "divorced","married","single","unknown"; note: "divorced" means divorced or widowed
educationhighest degree of customercategorical: "basic.4y","basic.6y","basic.9y","high.school","illiterate","professional.course","university.degree","unknown"
defaulthas credit in default?categorical: "no","yes","unknown"
housinghas housing loan?categorical: "no","yes","unknown"
loanhas personal loan?categorical: "no","yes","unknown"
contactcontact communication typecategorical: "cellular","telephone"
monthlast contact month of yearcategorical: "jan", "feb", "mar", ..., "nov", "dec"
day_of_weeklast contact day of the weekcategorical: "mon","tue","wed","thu","fri"
campaignnumber of contacts performed during this campaign and for this clientnumeric, includes last contact
pdaysnumber of days that passed by after the client was last contacted from a previous campaignnumeric; 999 means client was not previously contacted
previousnumber of contacts performed before this campaign and for this clientnumeric
poutcomeoutcome of the previous marketing campaigncategorical: "failure","nonexistent","success"
emp.var.rateemployment variation rate - quarterly indicatornumeric
cons.price.idxconsumer price index - monthly indicatornumeric
cons.conf.idxconsumer confidence index - monthly indicatornumeric
euribor3meuribor 3 month rate - daily indicatornumeric
nr.employednumber of employees - quarterly indicatornumeric
yhas the client subscribed a term deposit?binary: "yes","no"

Step 1. Explore and investigate data

head(bank)
str(bank)
summary(bank)
nrow(bank)
dim(bank)

Exploring task 1 - What are the jobs of the people most likely to subscribe to a term deposit?

# Filter the data where the client has subscribed a term deposit
subscribed_data <- bank %>% filter(y == "yes")

# Check the count of each job category
table(subscribed_data$job)

Visualizing task 2 - Create a plot to visualize the number of people subscribing to a term deposit by month

# Create the bar plot
ggplot(data = subscribed_data, aes(x = month)) +
  geom_bar(fill = "steelblue") +
  labs(x = "Month", y = "Number of Subscriptions", title = "Subscriptions by Month")

Analyzing task 3 - What impact does the number of contacts performed during the last campaign have on the likelihood that a customer subscribes to a term deposit?

โ€Œ
โ€Œ
โ€Œ