Skip to content
Loan Data
Objective (Invalid URL)
Build a classifier to predict whether a loan will be paid back based on this data. There are two things to note. First, there is class imbalance; there are fewer examples of loans not fully paid. Second, it's more important to accurately predict whether a loan will not be paid back rather than if a loan is paid back. Note how this is accounted for in training and evaluation your model.
Prepare a report that is accessible to a broad audience. It will need to outline your motivation, analysis steps, findings, and conclusions.
import numpy as np
from sklearn.metrics import confusion_matrix, classification_report
from matplotlib import pyplot as plt
import seaborn as sns
y_ex_true1 = np.array([1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0])
y_ex_pred1 = np.array([1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0])
# Create a heatmap of the confusion matrix
matrix =confusion_matrix(y_ex_true1,y_ex_pred1)
text = np.array([['Good Loans Approved %', 'Good Loans Disallowed %'],
['Bad Loans Approved %', 'Bad Loans Disallowed %']])
# combining text with values
formatted_text = (np.asarray(["{0}\n{1:.0f}".format(
text, matrix) for text, matrix in zip(text.flatten(), matrix.flatten())])).reshape(2,2)
# drawing heatmap
fig, ax = plt.subplots(figsize=(12,2))
sns.set(font_scale=1.3)
ax = sns.heatmap(matrix, annot=formatted_text, fmt="", linewidth=1,cbar=False)
ax.set_title('Confusion Matrix - What "good" looks like!', size = 18)
plt.show()
target_names = ['Fully Paid', 'Not Fully Paid']
print(classification_report(y_ex_true1,y_ex_pred1,target_names=target_names, zero_division=1))
Table of Contents (Invalid URL)
- Objective (Invalid URL)
- Executive Summary (Invalid URL)
- Load Dataset (Invalid URL)
- Exploratory Data Analysis (Invalid URL)
- Section 1.2.1 (Invalid URL)
- Section 1.2.2 (Invalid URL)
- Section 1.2.3 (Invalid URL)
- Exploratory Data Analysis (Invalid URL)
- Model Building (Invalid URL)
- Section 2.1 (Invalid URL)
- Section 2.2 (Invalid URL)
Chapter 1 (Invalid URL)
Executive Summary (Invalid URL)
*Table of Contents (Invalid URL)
Load Dataset (Invalid URL)
# Import initial modules,set the seaborn style, read the dataset into a dataframe and diplay the first five rows.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
SEED = 121864
plt.style.use('seaborn-colorblind')
sns.color_palette('colorblind')
df = pd.read_csv("loan_data.csv")
df.head()
# Inspect column names and data types.
df.info()
df.describe().round(3)
Of the thirteen numeric columns eight of them have a minimum value of zero. However, zero is a valid response for each column.
# Inspect the text column.
df.describe(include='object').round(3)
# Improve readability by replacing "." in column names with " " and add capitalization.
df.columns = [c.replace(".", " ") for c in df.columns]
df.columns = df.columns.str.title()
df.describe().round(2)