Skip to content
Finance Data
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
loan_data = pd.read_csv("loan_data.csv")
loan_data.head()
#Generate descriptive statistics
loan_data.describe()
Data dictionary
Variable | Explanation | |
---|---|---|
0 | credit_policy | 1 if the customer meets the credit underwriting criteria; 0 otherwise. |
1 | purpose | The purpose of the loan. |
2 | int_rate | The interest rate of the loan (more risky borrowers are assigned higher interest rates). |
3 | installment | The monthly installments owed by the borrower if the loan is funded. |
4 | log_annual_inc | The natural log of the self-reported annual income of the borrower. |
5 | dti | The debt-to-income ratio of the borrower (amount of debt divided by annual income). |
6 | fico | The FICO credit score of the borrower. |
7 | days_with_cr_line | The number of days the borrower has had a credit line. |
8 | revol_bal | The borrower's revolving balance (amount unpaid at the end of the credit card billing cycle). |
9 | revol_util | The borrower's revolving line utilization rate (the amount of the credit line used relative to total credit available). |
10 | inq_last_6mths | The borrower's number of inquiries by creditors in the last 6 months. |
11 | delinq_2yrs | The number of times the borrower had been 30+ days past due on a payment in the past 2 years. |
12 | pub_rec | The borrower's number of derogatory public records. |
13 | not_fully_paid | 1 if the loan is not fully paid; 0 otherwise. |
Source of dataset.
#Check if there are NaN
loan_data.isnull(). sum()
loan_data.keys()
#Number of loan purpose
sns.set_style('darkgrid')
sns.countplot(x = 'purpose', data = loan_data)
plt.xticks(rotation = 90)
plt.title('Number of Loan purpose')
plt.show()
Debt_consolidation is the most purpose while all_other and credit_card are the 2nd and 3rd reason
loan_data['installment'].median()
#Comparing the credit with interest rate and installment
sns.set_style('darkgrid')
sns.relplot(x= 'installment',
y= 'int.rate',
data= loan_data,
kind="line",
col = 'credit.policy',
hue ='credit.policy', ci = None )
plt.show()
- Credit.policy = 0 >> people who don't meet the credit
- Credit.policy = 1 >> otherwise
- There is a slight different between people meeting the credit vs with people who dont, the interest rate is slightly higher in people that dont meet the credit compare to people who meet the credit with the same amount of installment(1)
loan_data['not.fully.paid'].value_counts()
1 = the loan is not fully paid; 0 = otherwise
#Comparing the income and interests vs payment
sns.set_style('darkgrid')
sns.relplot(x= 'int.rate',
y= 'log.annual.inc',
data= loan_data,
kind="scatter",
hue ='not.fully.paid', col = 'not.fully.paid',
alpha = 0.5 )
plt.show()
There is no relationship between interest rate or annual income vs the payment. Note that:
- 1 : the loan not fully paid;
- 0 : otherwise
#Comparing the credit.policy vs not fully paid
sns.set_style('darkgrid')
g = sns.catplot(
data=loan_data, x="credit.policy", y="int.rate", col="not.fully.paid",palette= 'Spectral',
kind="bar", ci = None)
plt.show()