Skip to content

Project - Pacmann AI Data Science - Credit Risk Analysis

  1. Name: Geraldo Enrico Semen
  2. JPP Max Data Science Batch 5

Disclaimer! Kindly do not download the PDF if prompted (ignore it). Sorry for the inconvenience

Disclaimer! If you find this platform is unappealing (hehe sorry, I prefer datalab as my portofolio warehouse from now on), kindly visit: https://drive.google.com/file/d/17J9cgIG7hMAx4H3eCBmycv6jOM8oDI7f/view?usp=sharing

for google colab environment :).

Thank you for your understanding!

0. REVISION

Dear Pacmann, I've done a revision on Section 1.8, 2.2, 2.3, 3.1.

Based on your suggestion :).

Thank you for the critique.

Link for Google Colab: https://drive.google.com/file/d/1J2yRHEOdf8VxQPHwE_YgQ1KbkMTx4j_A/view?usp=sharing

Link for this Workbook (If you dont want to see in this PDF format): https://www.datacamp.com/datalab/w/931b5b23-3322-4051-a6fb-3781ac0ec884/edit

1. Data Preparation

1.0 Key Definition

1.0.1 Loan Default

Loan Default

  • Definition: Loan default occurs when a borrower fails to meet the legal obligations of a loan agreement, typically by missing scheduled payments of interest or principal.
  • Causes: Common causes include financial mismanagement, economic downturns, loss of employment, or unexpected expenses like medical bills.
  • Consequences:
    • Credit Impact: Defaulting on a loan can significantly damage a borrower’s credit score, making it difficult to obtain future credit.
    • Legal Actions: Lenders may initiate collection efforts, employ debt collectors, or pursue legal action to recover the outstanding amount.
    • Asset Repossession: For secured loans, lenders can repossess collateral, such as homes or cars.
    • Financial Strain: Defaults can lead to increased interest costs and additional fees.

1.0.2 Loan Non Default

  • Definition: Non-default refers to a scenario where the borrower continues to meet all the terms and conditions of the loan agreement without missing payments.
  • Benefits:
    • Credit Health: Maintaining timely payments helps preserve and potentially improve credit scores.
    • Financial Stability: Consistent payments prevent legal actions and additional financial burdens.
    • Access to Credit: Borrowers in good standing are more likely to receive favorable terms on future loans.

1.1 Importing the neccessary libraries

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.dummy import DummyClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score
from sklearn.metrics import classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns

1.2 Loading the Data

df = pd.read_csv('credit_risk_dataset.csv')
df.head()

1.3 Dealing with the missing values

# Display basic information about the dataframe
df.info()
# Check for missing values
df.isnull().sum()

It seems that there are missing values in loan_int_rate and person_emp_length

1.4 Numerical Descriptive Statistics