Project - Pacmann AI Data Science - Credit Risk Analysis
- Name: Geraldo Enrico Semen
- JPP Max Data Science Batch 5
Disclaimer! Kindly do not download the PDF if prompted (ignore it). Sorry for the inconvenience
Disclaimer! If you find this platform is unappealing (hehe sorry, I prefer datalab as my portofolio warehouse from now on), kindly visit: https://drive.google.com/file/d/17J9cgIG7hMAx4H3eCBmycv6jOM8oDI7f/view?usp=sharing
for google colab environment :).
Thank you for your understanding!
0. REVISION
Dear Pacmann, I've done a revision on Section 1.8, 2.2, 2.3, 3.1.
Based on your suggestion :).
Thank you for the critique.
Link for Google Colab: https://drive.google.com/file/d/1J2yRHEOdf8VxQPHwE_YgQ1KbkMTx4j_A/view?usp=sharing
Link for this Workbook (If you dont want to see in this PDF format): https://www.datacamp.com/datalab/w/931b5b23-3322-4051-a6fb-3781ac0ec884/edit
1. Data Preparation
1.0 Key Definition
1.0.1 Loan Default
Loan Default
- Definition: Loan default occurs when a borrower fails to meet the legal obligations of a loan agreement, typically by missing scheduled payments of interest or principal.
- Causes: Common causes include financial mismanagement, economic downturns, loss of employment, or unexpected expenses like medical bills.
- Consequences:
- Credit Impact: Defaulting on a loan can significantly damage a borrower’s credit score, making it difficult to obtain future credit.
- Legal Actions: Lenders may initiate collection efforts, employ debt collectors, or pursue legal action to recover the outstanding amount.
- Asset Repossession: For secured loans, lenders can repossess collateral, such as homes or cars.
- Financial Strain: Defaults can lead to increased interest costs and additional fees.
1.0.2 Loan Non Default
- Definition: Non-default refers to a scenario where the borrower continues to meet all the terms and conditions of the loan agreement without missing payments.
- Benefits:
- Credit Health: Maintaining timely payments helps preserve and potentially improve credit scores.
- Financial Stability: Consistent payments prevent legal actions and additional financial burdens.
- Access to Credit: Borrowers in good standing are more likely to receive favorable terms on future loans.
1.1 Importing the neccessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.dummy import DummyClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score
from sklearn.metrics import classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns1.2 Loading the Data
df = pd.read_csv('credit_risk_dataset.csv')
df.head()1.3 Dealing with the missing values
# Display basic information about the dataframe
df.info()# Check for missing values
df.isnull().sum()It seems that there are missing values in loan_int_rate and person_emp_length
1.4 Numerical Descriptive Statistics