πͺ Bank Marketing
This dataset consists of direct marketing campaigns by a Portuguese banking institution using phone calls. The campaigns aimed to sell subscriptions to a bank term deposit (see variable y).
import pandas as pd
bm = pd.read_csv("bank-marketing.csv", sep=";")
bm.head(4000)Data Dictionary
| Column | Variable | Class |
|---|---|---|
| age | age of customer | |
| job | type of job | categorical: "admin.","blue-collar","entrepreneur","housemaid","management","retired","self-employed","services","student","technician","unemployed","unknown" |
| marital | marital status | categorical: "divorced","married","single","unknown"; note: "divorced" means divorced or widowed |
| education | highest degree of customer | categorical: "basic.4y","basic.6y","basic.9y","high.school","illiterate","professional.course","university.degree","unknown" |
| default | has credit in default? | categorical: "no","yes","unknown" |
| housing | has housing loan? | categorical: "no","yes","unknown" |
| loan | has personal loan? | categorical: "no","yes","unknown" |
| contact | contact communication type | categorical: "cellular","telephone" |
| month | last contact month of year | categorical: "jan", "feb", "mar", ..., "nov", "dec" |
| day_of_week | last contact day of the week | categorical: "mon","tue","wed","thu","fri" |
| campaign | number of contacts performed during this campaign and for this client | numeric, includes last contact |
| pdays | number of days that passed by after the client was last contacted from a previous campaign | numeric; 999 means client was not previously contacted |
| previous | number of contacts performed before this campaign and for this client | numeric |
| poutcome | outcome of the previous marketing campaign | categorical: "failure","nonexistent","success" |
| emp.var.rate | employment variation rate - quarterly indicator | numeric |
| cons.price.idx | consumer price index - monthly indicator | numeric |
| cons.conf.idx | consumer confidence index - monthly indicator | numeric |
| euribor3m | euribor 3 month rate - daily indicator | numeric |
| nr.employed | number of employees - quarterly indicator | numeric |
| y | has the client subscribed a term deposit? | binary: "yes","no" |
Source of dataset.
Citations:
- S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014
- S. Moro, R. Laureano and P. Cortez. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology. In P. Novais et al. (Eds.), Proceedings of the European Simulation and Modelling Conference - ESM'2011, pp. 117-121, Guimaraes, Portugal, October, 2011. EUROSIS.
1 hidden cell
πΊοΈ What are the jobs of the people most likely to subscribe to a term deposit?
Over 50% of the term deposit subscribers were administrators, technicians, or blue-collar workers.
import matplotlib.pyplot as plt
import pandas as pd
# Assuming bm is already defined as a DataFrame
# Filter the dataframe to include only those who subscribed to a term deposit
subscribers = bm[bm['y'] == 'yes']
# Count the number of subscribers for each job
job_counts = subscribers['job'].value_counts()
# Keep the top 7 jobs and label the rest as "other"
top_jobs = job_counts.nlargest(7)
other_jobs_count = job_counts[7:].sum()
job_counts_top7 = pd.concat([top_jobs, pd.Series(other_jobs_count, index=['other'])])
# Plot a pie chart
plt.figure(figsize=(10, 8))
plt.pie(job_counts_top7, labels=job_counts_top7.index, autopct='%1.1f%%', startangle=140)
plt.title('Job Distribution of Term Deposit Subscribers', pad=30) # Increase the padding between the title and the chart
plt.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle.
plt.show()π Visualizing subscribers to a term deposit by month.
The highest number of subscribers subscribed between April and August.
# Ensure the 'month' column is ordered correctly
month_order = ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep', 'oct', 'nov', 'dec']
subscribers['month'] = pd.Categorical(subscribers['month'], categories=month_order, ordered=True)
# Get the count of subscribers by month
month_counts = subscribers['month'].value_counts().sort_index()
# Create a bar chart
plt.figure(figsize=(10, 6))
bars = plt.bar(month_counts.index, month_counts, color='skyblue')
# Add counts on top of the bars
for bar in bars:
yval = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2, yval, int(yval), ha='center', va='bottom')
# Add labels and title
plt.xlabel('Month')
plt.ylabel('Number of Subscribers')
plt.title('Number of Subscribers by Month')
plt.xticks(rotation=45)
plt.show()# Calculate the total contacting frequency (campaign) by month
total_contact_freq_by_month = subscribers.groupby('month')['campaign'].sum()
# Create a bar chart to visualize the total contacting frequency by month
plt.figure(figsize=(10, 6))
bars = plt.bar(total_contact_freq_by_month.index, total_contact_freq_by_month, color='lightcoral')
# Add total values on top of the bars
for bar in bars:
yval = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2, yval, int(yval), ha='center', va='bottom')
# Add labels and title
plt.xlabel('Month')
plt.ylabel('Total Contacting Frequency')
plt.title('Total Contacting Frequency by Month')
plt.xticks(rotation=45)
plt.show()π What impact does the number of contacts performed during the last campaign have on the likelihood that a customer subscribes to a term deposit?
π APPROACH:
Created a column (subscribed) with a binary value of 1 or 0, then grouped by the number of contacts (campaign) and the mean subscription rate was calculated for each group. Rows where the subscription rate was 0 were filtered out.
π KEY FINDINGS:
- Low number of contacts resulted in a higher subscription rate.
- The higher the number of contacts, the less likely they will susbcribe.
import seaborn as sns
# Create a new column 'subscribed' which is 1 if 'y' is 'yes' and 0 otherwise
bm['subscribed'] = bm['y'].apply(lambda x: 1 if x == 'yes' else 0)
# Group by the number of contacts performed during the last campaign and calculate the mean subscription rate
campaign_subscription_rate = bm.groupby('campaign')['subscribed'].mean().reset_index()
# Create a bar plot to visualize the impact of the number of contacts on subscription rate
plt.figure(figsize=(12, 6))
sns.barplot(x='campaign', y='subscribed', data=campaign_subscription_rate, palette='viridis')
# Draw a line to show the tendency
sns.lineplot(x='campaign', y='subscribed', data=campaign_subscription_rate, color='red', marker='o')
# Add labels and title
plt.xlabel('Number of Contacts During Last Campaign')
plt.ylabel('Subscription Rate')
plt.title('Impact of Number of Contacts During Last Campaign on Subscription Rate')
# Adjust the x-axis to make the values more readable by increasing the distance between the X values
plt.xticks(ticks=range(0, campaign_subscription_rate['campaign'].max() + 1, 2), rotation=45)
plt.show()import matplotlib.pyplot as plt
import seaborn as sns
# Plot the count of different values in the 'y' column
plt.figure(figsize=(10, 6))
ax = sns.countplot(data=bm, x='y')
plt.title('Distribution of Subscription Status (Yes/No) in the Marketing Campaign Dataset', pad=20)
plt.xlabel('Subscribed')
plt.ylabel('Count')
# Add the count of each category in the middle of the bar
for p in ax.patches:
ax.annotate(f'{int(p.get_height()):,}', (p.get_x() + p.get_width() / 2, p.get_height() / 2),
ha='center', va='center', color='white', fontsize=12, fontweight='bold')
plt.show()Study the relationship between CPI, CCI, and the subscription rate
The CPI indicates that the economy is generally stable, but the customers are afraid of spending their money. Optimistic customers are more likely to subscribe to the bank term deposit, while pessimistic customers are not.
β
β