Credit Card Recommendation
Credit Card Recommendation in Banking
“Golden Horizon Bank” is a private bank which provides various products to its customers, such as savings accounts, home loans, car loans, credit cards and so on. Currently, the new manager of the bank identified that the income from credit cards is quite low compared to the other services. He made the decision to take some actions to increase credit card earnings. The first step he decided is to recommend credit cards to the bank's customers rather than to new customers, because they already have trust in the company. As the manager knows the capability and potential of data in banking, he instructed the data science team lead to build a model that predicts which customers are more likely to buy credit cards and the team lead appointed you to this task. Try to build a model which helps the bank to get maximum out of the efforts in selling more credit cards. Good luck!
Data Set Information:
Total Entries: 245,725
Number of Columns: 11
Column Types:
4 columns of type int64 (Age, Vintage, Avg_Account_Balance, Need_Credit_Card)
7 object type columns (User_ID, Gender, Area_Code, Profession, Channel_Code, Has_Credit, Is_Active)
Important Columns:
User_ID: Customer ID
Gender: Gender.
Age: Age
Area_Code: Area code
Profession: Profession
Channel_Code: Channel code used
Vintage: Duration of the relationship with the bank
Has_Credit: Whether or not you have a credit card
Avg_Account_Balance: Average account balance
Is_Active: Is the account active or not?
Need_Credit_Card: Whether you need a credit card or not
Basic Information & Summary
import pandas as pd
import numpy as np
df = pd.read_csv("credit_card_recommendation.csv")
df.info()#Info on DataFrame
df.head(5)
Data Set Statistics & Missing Data
def check_data(df, head=5):
print("######## SHAPE ########") # (rows,columns)
print(df.shape)
print("######## TYPES ########") # Data types
print(df.dtypes)
print("######## HEAD ########") # First 5 lines
print(df.head(head))
print("######## TAIL ########") # Last 5 lines
print(df.tail(head))
print("######## NaN ########") # Check for missing values
print(df.isnull().sum())
print("######## DESCRIBE ########") # Summary statistics
print(df.describe())
print("######## INDEX ########") # Describe index
print(df.index)
print("######## COLUMNS ########") # Describe DataFrame columns
print(df.columns)
print("######## COUNT ########") # Number of non-NA values
print(df.count())
check_data(df)
- Age average 43.86, minimum 23, maximum 85.
- Vintage average 46.96, minimum 7, maximum 135.
- Average Account Balance (Avg_Account_Balance) average 1,128,403.10, minimum 20,790, maximum 10,352,009.
- Need for Credit Card (Need_Credit_Card) received 23.72 per cent positive responses.
- There are 29,325 missing data in the Has_Credit column.
# Fill missing data with average value
df['Has_Credit'].fillna(df['Has_Credit'].mode()[0], inplace=True)
# Check the data set again
check_data(df)
Visualling
Various visualizations are made on the data set. For example, a line chart showing average account balance by bio, an Extreme bar chart showing credit card hosting and average account balance by industry, bank relationship duration, and a scatter plot showing average account balance and credit card usage.
import matplotlib.pyplot as plt
import seaborn as sns
from wordcloud import WordCloud
# Line Plot
plt.figure(figsize=(12, 6))
sns.lineplot(x='Age', y='Avg_Account_Balance', data=df, ci=None)
plt.title('Average Account Balance by Age')
plt.show()
# Stacked Bar Plot
plt.figure(figsize=(12, 6))
sns.barplot(x='Profession', y='Avg_Account_Balance', hue='Has_Credit', data=df)
plt.title('Credit Card Ownership and Average Account Balance Across Occupations')
plt.show()
# Scatter Plot
plt.figure(figsize=(12, 6))
sns.scatterplot(x='Vintage', y='Avg_Account_Balance', hue='Need_Credit_Card', data=df)
plt.title('Average Account Balance and Credit Card Need by Bank Relationship Duration')
plt.show()
# Connected Scatter Plot
plt.figure(figsize=(12, 6))
sns.lineplot(x='Age', y='Vintage', data=df, sort=False)
plt.title('Bank Relationship Duration by Age')
plt.show()
# Bubble Graph
plt.figure(figsize=(12, 6))
sns.scatterplot(x='Age', y='Vintage', size='Avg_Account_Balance', hue='Need_Credit_Card', data=df)
plt.title('Credit Card Need and Average Account Balance by Age and Bank Relationship Duration')
plt.show()
# Word Cloud
wordcloud = WordCloud(width=800, height=400, random_state=42, background_color='white').generate(' '.join(df['Profession']))
plt.figure(figsize=(12, 6))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title('Occupations Word Cloud')
plt.show()
Modelling