Skip to content

Richard Pallangyo

Project: Does a Borrower Meet Credit Underwriting Criteria.

Introduction

The goal of this project is to predict whether a borrower meets credit policy criteria. This analysis will use a loan dataset to determine, based on various factors considered to evaluate loan applications, if a borrower meets the credit underwriting criteria or otherwise.

The project will include two significant steps.

  1. Exploratory Data Analysis
  2. Predictive Analytics.

The first step will help us understand the relationship of whether a borrower meets credit policy criteria and the factors/variables in our dataset.

The second step will involve building four models and comparing their performances. Additionally, these models will help us understand whether the predictors in our dataset can explain if a borrower meets credit policy criteria or not.

Load all required packages

# Load packages
import numpy as np 
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn import metrics
from sklearn.preprocessing import StandardScaler 
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from xgboost import XGBClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier

Load the data

# Load data from the csv file
df = pd.read_csv('loan_data.csv', index_col=None)

# Change the dots in the column names to underscores
df.columns = [c.replace(".", "_") for c in df.columns]
print(f"Number of rows/records: {df.shape[0]}")
print(f"Number of columns/variables: {df.shape[1]}")
df.head()

Understanding the variables

# Understand the variables
variables = pd.DataFrame(columns=['Variable','Number of unique values','Values'])

for i, var in enumerate(df.columns):
    variables.loc[i] = [var, df[var].nunique(), df[var].unique().tolist()]
    
# Join with the variables dataframe
var_dict = pd.read_csv('variable_explanation.csv', index_col=0)
variables.set_index('Variable').join(var_dict)
What variable(s) do we want to predict?

We are interested in whether a user/customer meets the credit underwriting criteria. So we will be predicting the credit_policy variable.

What variables are possible predictors?

We will consider the following inputs (predictors) to the model(s) we will construct.

  • The interest rate of the loan.
  • The monthly installments owed by the borrower if the loan is funded.
  • The natural log of the self-reported annual income of the borrower.
  • The debt-to-income ratio of the borrower.
  • The FICO credit score of the borrower.
  • The number of days the borrower has had a credit line.
  • The borrower's revolving balance.
  • The borrower's revolving line utilization rate.
  • The borrower's number of inquiries by creditors in the last 6 months.
  • The number of times the borrower had been 30+ days past due on a payment in the past 2 years.
  • The borrower's number of derogatory public records.

Exploratory Data Analysis

Graphical Summaries