Skip to content

Loan Data

Ready to put your coding skills to the test? Join us for our Workspace Competition!
For more information, visit datacamp.com/workspacecompetition

Context

This dataset (source) consists of data from almost 10,000 borrowers that took loans - with some paid back and others still in progress. It was extracted from lendingclub.com which is an organization that connects borrowers with investors. We've included a few suggested questions at the end of this template to help you get started.

Load packages

library(skimr)
library(tidyverse)

Load your Data

loans <- readr::read_csv('data/loans.csv.gz')
skim(loans) %>% 
  select(-(numeric.p0:numeric.p100)) %>%
  select(-(complete_rate))

Understand your data

Variableclassdescription
credit_policynumeric1 if the customer meets the credit underwriting criteria; 0 otherwise.
purposecharacterThe purpose of the loan.
int_ratenumericThe interest rate of the loan (more risky borrowers are assigned higher interest rates).
installmentnumericThe monthly installments owed by the borrower if the loan is funded.
log_annual_incnumericThe natural log of the self-reported annual income of the borrower.
dtinumericThe debt-to-income ratio of the borrower (amount of debt divided by annual income).
ficonumericThe FICO credit score of the borrower.
days_with_cr_linenumericThe number of days the borrower has had a credit line.
revol_balnumericThe borrower's revolving balance (amount unpaid at the end of the credit card billing cycle).
revol_utilnumericThe borrower's revolving line utilization rate (the amount of the credit line used relative to total credit available).
inq_last_6mthsnumericThe borrower's number of inquiries by creditors in the last 6 months.
delinq_2yrsnumericThe number of times the borrower had been 30+ days past due on a payment in the past 2 years.
pub_recnumericThe borrower's number of derogatory public records.
not_fully_paidnumeric1 if the loan is not fully paid; 0 otherwise.

Now you can start to explore this dataset with the chance to win incredible prices! Can't think of where to start? Try your hand at these suggestions:

  • Extract useful insights and visualize them in the most interesting way possible.
  • Find out how long it takes for users to pay back their loan.
  • Build a model that can predict the probability a user will be able to pay back their loan within a certain period.
  • Find out what kind of people take a loan for what purposes.

Judging Criteria

CATEGORYWEIGHTAGEDETAILS

| Analysis | 30% |

  • Documentation on the goal and what was included in the analysis

  • How the question was approached

  • Visualisation tools and techniques utilized

| | Results | 30% |

  • How the results derived related to the problem chosen

  • The ability to trigger potential further analysis

| | Creativity | 40% |

  • How "out of the box" the analysis conducted is

  • Whether the publication is properly motivated and adds value

|

Part 1 - Exploratory Data Analysis

How many loans are not paid back?
# 1 identifies loans that are not fully paid back by the borrower
table(loans$not_fully_paid)
What purposes do people take a loan for?

First we explore the purposes of the loans and try to answer the following question:
What purposes do people take a loan for?

tbl <- table(loans$not_fully_paid, loans$purpose, dnn = c("not_fully_paid","purpose"))
tbl
proportions(tbl, "not_fully_paid")
prop.table(tbl, 1)
prop.table(tbl, 2)

The main purpose appears to be "debt_consolidation" and the following barplot confirms it.

options(repr.plot.width = 12)

loans %>% 
    ggplot(aes(x = purpose, group = not_fully_paid)) + 
    geom_bar(aes(fill = as.factor(not_fully_paid)), position = "stack") + 
    coord_flip() + theme(legend.position = "top")
# Are loans fully paid back by borrowers that meet credit policy criteria?
table(loans$not_fully_paid, loans$credit_policy, dnn = c("not_fully_paid", "credit_policy"))