Skip to content
Predicting Credit Card Approvals
Setup
1. Importing necessary libraries
import pandas as pd
import numpy as np2. Loading dataset
cc_apps = pd.read_csv('datasets/cc_approvals.data', header=None)
cc_apps3. Inspecting the applications
The output may appear a bit confusing at its first sight, but let's try to figure out the most important features of a credit card application. The features of this dataset have been anonymized to protect the privacy, but this blog gives us a pretty good overview of the probable features. The probable features in a typical credit card application are:
0Gender1Age2Debt3Married4BankCustomer5EducationLevel6Ethnicity7YearsEmployed8PriorDefault9Employed10CreditScore11DriversLicense12Citizen13ZipCode14Income15ApprovalStatusThis gives us a pretty good starting point, and we can map these features with respect to the columns in the output.
# Add header
cc_apps.columns = ["gender", "age", "debt", "married", "bank_customer", "education_level", "ethnicity", "years_employed", "prior_default", "employed", "credit_score", "driver_license", "citizen", "zip_code", "income", "approval_status"]
# Inpect some rows
print("Head")
print(cc_apps.head())
# Print summary statistics
print("\n")
print("Description")
print(cc_apps.describe())
# Print DataFrame information
print("\n")
print("Info")
print(cc_apps.info())Hidden output
4. Splitting the dataset into "features" and "target"
target_column = 'approval_status'
X = cc_apps.drop(columns=[target_column])
y = cc_apps[target_column]Cleaning
1. Drop non essentials columns
non_essentials_columns = ['driver_license', 'zip_code']
X.drop(columns=non_essentials_columns)Hidden output
2. Replace nan