Loan Data — DataLab

Objective (Invalid URL)

Build a classifier to predict whether a loan will be paid back based on this data. There are two things to note. First, there is class imbalance; there are fewer examples of loans not fully paid. Second, it's more important to accurately predict whether a loan will not be paid back rather than if a loan is paid back. Note how this is accounted for in training and evaluation your model.

Prepare a report that is accessible to a broad audience. It will need to outline your motivation, analysis steps, findings, and conclusions.


import numpy as np
from sklearn.metrics import confusion_matrix, classification_report
from matplotlib import pyplot as plt
import seaborn as sns

y_ex_true1  = np.array([1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,
                      0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                      0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                      0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0])
y_ex_pred1  = np.array([1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,
                      0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                      0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                      0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0])

# Create a heatmap of the confusion matrix
matrix =confusion_matrix(y_ex_true1,y_ex_pred1)
text = np.array([['Good Loans Approved %', 'Good Loans Disallowed %'],
                ['Bad Loans Approved %', 'Bad Loans Disallowed %']])

# combining text with values
formatted_text = (np.asarray(["{0}\n{1:.0f}".format(
    text, matrix) for text, matrix in zip(text.flatten(), matrix.flatten())])).reshape(2,2)

# drawing heatmap
fig, ax = plt.subplots(figsize=(12,2))
sns.set(font_scale=1.3)
ax = sns.heatmap(matrix, annot=formatted_text, fmt="", linewidth=1,cbar=False)
ax.set_title('Confusion Matrix - What "good" looks like!', size = 18)
plt.show()

target_names = ['Fully Paid', 'Not Fully Paid']
print(classification_report(y_ex_true1,y_ex_pred1,target_names=target_names, zero_division=1))

Table of Contents (Invalid URL)

Objective (Invalid URL)
Executive Summary (Invalid URL)
Load Dataset (Invalid URL)
- Exploratory Data Analysis (Invalid URL)
  - Section 1.2.1 (Invalid URL)
  - Section 1.2.2 (Invalid URL)
  - Section 1.2.3 (Invalid URL)
Model Building (Invalid URL)
- Section 2.1 (Invalid URL)
- Section 2.2 (Invalid URL)

Chapter 1 (Invalid URL)

Executive Summary (Invalid URL)

*Table of Contents (Invalid URL)

Load Dataset (Invalid URL)

# Import initial modules,set the seaborn style, read the dataset into a dataframe and diplay the first five rows.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
SEED = 121864
plt.style.use('seaborn-colorblind')
sns.color_palette('colorblind')
df = pd.read_csv("loan_data.csv")
df.head()

# Inspect column names and data types.
df.info()

df.describe().round(3)

Of the thirteen numeric columns eight of them have a minimum value of zero. However, zero is a valid response for each column.

# Inspect the text column.
df.describe(include='object').round(3)

# Improve readability by replacing "." in column names with " " and add capitalization.
df.columns = [c.replace(".", " ") for c in df.columns]
df.columns = df.columns.str.title()
df.describe().round(2)

‌
‌
‌