Skip to content
New Workbook
Sign up
Finance Data
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

loan_data = pd.read_csv("loan_data.csv")
loan_data.head()

#Generate descriptive statistics
loan_data.describe()

Data dictionary

VariableExplanation
0credit_policy1 if the customer meets the credit underwriting criteria; 0 otherwise.
1purposeThe purpose of the loan.
2int_rateThe interest rate of the loan (more risky borrowers are assigned higher interest rates).
3installmentThe monthly installments owed by the borrower if the loan is funded.
4log_annual_incThe natural log of the self-reported annual income of the borrower.
5dtiThe debt-to-income ratio of the borrower (amount of debt divided by annual income).
6ficoThe FICO credit score of the borrower.
7days_with_cr_lineThe number of days the borrower has had a credit line.
8revol_balThe borrower's revolving balance (amount unpaid at the end of the credit card billing cycle).
9revol_utilThe borrower's revolving line utilization rate (the amount of the credit line used relative to total credit available).
10inq_last_6mthsThe borrower's number of inquiries by creditors in the last 6 months.
11delinq_2yrsThe number of times the borrower had been 30+ days past due on a payment in the past 2 years.
12pub_recThe borrower's number of derogatory public records.
13not_fully_paid1 if the loan is not fully paid; 0 otherwise.

Source of dataset.

#Check if there are NaN
loan_data.isnull(). sum()
loan_data.keys()
#Number of loan purpose
sns.set_style('darkgrid')
sns.countplot(x = 'purpose', data = loan_data)
plt.xticks(rotation = 90)
plt.title('Number of Loan purpose')
plt.show()

Debt_consolidation is the most purpose while all_other and credit_card are the 2nd and 3rd reason

loan_data['installment'].median()
#Comparing the credit with interest rate and installment
sns.set_style('darkgrid')
sns.relplot(x= 'installment',
            y= 'int.rate',
            data= loan_data,
            kind="line",
            col = 'credit.policy',
            hue ='credit.policy', ci = None  )
plt.show()
  • Credit.policy = 0 >> people who don't meet the credit
  • Credit.policy = 1 >> otherwise
  • There is a slight different between people meeting the credit vs with people who dont, the interest rate is slightly higher in people that dont meet the credit compare to people who meet the credit with the same amount of installment(1)
loan_data['not.fully.paid'].value_counts()

1 = the loan is not fully paid; 0 = otherwise

#Comparing the income and interests vs payment
sns.set_style('darkgrid')
sns.relplot(x= 'int.rate',
            y=  'log.annual.inc',
            data= loan_data,
            kind="scatter",
            hue ='not.fully.paid', col = 'not.fully.paid',
            alpha = 0.5 )
plt.show()

There is no relationship between interest rate or annual income vs the payment. Note that:

  • 1 : the loan not fully paid;
  • 0 : otherwise
#Comparing the credit.policy vs not fully paid
sns.set_style('darkgrid')
g = sns.catplot(
    data=loan_data, x="credit.policy", y="int.rate", col="not.fully.paid",palette= 'Spectral',
    kind="bar", ci = None)
plt.show()