Skip to content
New Workbook
Sign up
Competition - Loan Data

Loan Data

Ready to put your coding skills to the test? Join us for our Workspace Competition!
For more information, visit datacamp.com/workspacecompetition

Context

This dataset (source) consists of data from almost 10,000 borrowers that took loans - with some paid back and others still in progress. It was extracted from lendingclub.com which is an organization that connects borrowers with investors. We've included a few suggested questions at the end of this template to help you get started.

Load packages

library(skimr)
library(tidyverse)

Load your Data

loans <- readr::read_csv('data/loans.csv.gz',show_col_types = FALSE)
# skim(loans) %>% 
  # #select(-(numeric.p0:numeric.p100)) %>%
  # select(-(complete_rate))
As stated above in data summary , there is no na in the datset, so we can proceed

let's analyze data

loans_grouped_credit_purpose <- loans %>% 
  group_by(credit_policy, purpose) %>% 
  arrange(desc(log_annual_inc))
loans_grouped_credit_purpose$credit_policy <- factor(loans$credit_policy,
                               levels =c(0,1),
                              # 0 : default : doesn't meet the criteria 
                              # 1 : meet_criteria : meet criteria to take a loan 
                              labels = c("default","meet_criteria")) 

# check
prop.table(table(loans_grouped_credit_purpose$credit_policy)) %>% round(2)

The proportion of candidate that met credit policy criteria is 0.8.

what is the purpose with the highest proportion among borrowers ?

loans_grouped_credit_purpose %>% 
  summarize(count = n(),
            prop = round(count/nrow(loans_grouped_credit_purpose),2)) %>% 
  arrange(desc(prop)) -> df


df %>% 
  ggplot( aes(purpose, fill = credit_policy) ) +
  geom_col(aes(y = prop),position ="dodge") +
  labs(title ="credit purpose by credit policy") + 
  theme(axis.text.x = element_text(hjust = 0.10,angle =- 45))

Overall, the purpose with the highest proportion is debt consolidation.

among those who underwrite credit policy, what are the 3 top loan purpose ?

# numerical summary
top_3_loans <- head(df,3)

# loans_meet_criteria_top3
loans_meet_criteria_top3 <- loans %>%  
  filter(credit_policy == 1,purpose %in% top_3_loans$purpose)  
  

# plot
loans_meet_criteria_top3 %>% 
  #filter(purpose %in% c(top_3_loans$purpose)) %>% 
  ggplot(aes(purpose)) +
  geom_bar() +
  labs(title =" top 3 loans purpose among those who meet credit policy criteria") + 
  theme(axis.text.x = element_text(hjust = 0.10,angle =- 45))

debt consolidation is the most common purpose, with a proportion of r round(top_3_loans$prop[1],2) among those who meet credit policy criteria/ underwrite credit.

For those who meet criteria, what is the annual average income given a purpose?

loans_grouped_credit_purpose %>% 
  filter(credit_policy =="meet_criteria") %>% 
  summarize (avg_annual_inc = exp(mean(log_annual_inc))) %>% 
  arrange(desc(avg_annual_inc)) %>% 
  top_n(n = 3)

Candidates with high annual income tend to have home improvement, small business and credit card as a loan purpose.

Is there any association between interest rate and top 3 loans purposes ?