Skip to content
Loan: Does a Borrower Meet Credit Underwriting Criteria
  • AI Chat
  • Code
  • Report
  • Spinner

    Richard Pallangyo

    Project: Does a Borrower Meet Credit Underwriting Criteria.


    The goal of this project is to predict whether a borrower meets credit policy criteria. This analysis will use a loan dataset to determine, based on various factors considered to evaluate loan applications, if a borrower meets the credit underwriting criteria or otherwise.

    The project will include two significant steps.

    1. Exploratory Data Analysis
    2. Predictive Analytics.

    The first step will help us understand the relationship of whether a borrower meets credit policy criteria and the factors/variables in our dataset.

    The second step will involve building four models and comparing their performances. Additionally, these models will help us understand whether the predictors in our dataset can explain if a borrower meets credit policy criteria or not.

    Load all required packages

    # Load packages
    import numpy as np 
    import pandas as pd
    import seaborn as sns
    import matplotlib.pyplot as plt
    from sklearn import metrics
    from sklearn.preprocessing import StandardScaler 
    from sklearn.model_selection import train_test_split
    from sklearn.linear_model import LogisticRegression
    from xgboost import XGBClassifier
    from sklearn.neighbors import KNeighborsClassifier
    from sklearn.ensemble import RandomForestClassifier

    Load the data

    # Load data from the csv file
    df = pd.read_csv('loan_data.csv', index_col=None)
    # Change the dots in the column names to underscores
    df.columns = [c.replace(".", "_") for c in df.columns]
    print(f"Number of rows/records: {df.shape[0]}")
    print(f"Number of columns/variables: {df.shape[1]}")

    Understanding the variables

    # Understand the variables
    variables = pd.DataFrame(columns=['Variable','Number of unique values','Values'])
    for i, var in enumerate(df.columns):
        variables.loc[i] = [var, df[var].nunique(), df[var].unique().tolist()]
    # Join with the variables dataframe
    var_dict = pd.read_csv('variable_explanation.csv', index_col=0)
    What variable(s) do we want to predict?

    We are interested in whether a user/customer meets the credit underwriting criteria. So we will be predicting the credit_policy variable.

    What variables are possible predictors?

    We will consider the following inputs (predictors) to the model(s) we will construct.

    • The interest rate of the loan.
    • The monthly installments owed by the borrower if the loan is funded.
    • The natural log of the self-reported annual income of the borrower.
    • The debt-to-income ratio of the borrower.
    • The FICO credit score of the borrower.
    • The number of days the borrower has had a credit line.
    • The borrower's revolving balance.
    • The borrower's revolving line utilization rate.
    • The borrower's number of inquiries by creditors in the last 6 months.
    • The number of times the borrower had been 30+ days past due on a payment in the past 2 years.
    • The borrower's number of derogatory public records.

    Exploratory Data Analysis

    Graphical Summaries