Predicting Credit Card Approvals

Setup

1. Importing necessary libraries

import pandas as pd
import numpy as np

2. Loading dataset

cc_apps = pd.read_csv('datasets/cc_approvals.data', header=None)
cc_apps

3. Inspecting the applications

The output may appear a bit confusing at its first sight, but let's try to figure out the most important features of a credit card application. The features of this dataset have been anonymized to protect the privacy, but this blog gives us a pretty good overview of the probable features. The probable features in a typical credit card application are:

0 Gender
1 Age
2 Debt
3 Married
4 BankCustomer
5 EducationLevel
6 Ethnicity
7 YearsEmployed
8 PriorDefault
9 Employed
10 CreditScore
11 DriversLicense
12 Citizen
13 ZipCode
14 Income
15 ApprovalStatus This gives us a pretty good starting point, and we can map these features with respect to the columns in the output.

# Add header
cc_apps.columns = ["gender", "age", "debt", "married", "bank_customer", "education_level", "ethnicity", "years_employed", "prior_default", "employed", "credit_score", "driver_license", "citizen", "zip_code", "income", "approval_status"]

# Inpect some rows
print("Head")
print(cc_apps.head())

# Print summary statistics
print("\n")
print("Description")
print(cc_apps.describe())

# Print DataFrame information
print("\n")
print("Info")
print(cc_apps.info())

Hidden output

4. Splitting the dataset into "features" and "target"

target_column = 'approval_status'
X = cc_apps.drop(columns=[target_column])
y = cc_apps[target_column]

Cleaning

1. Drop non essentials columns

non_essentials_columns = ['driver_license', 'zip_code']
X.drop(columns=non_essentials_columns)