Skip to content
K-Nearest Neighbors (KNN) Classification with R tutorial
  • AI Chat
  • Code
  • Report
  • Loan Data

    This dataset consists of more than 9,500 loans with information on the loan structure, the borrower, and whether the loan was pain back in full. This data was extracted from LendingClub.com, which is a company that connects borrowers with investors.

    suppressPackageStartupMessages(library(tidyverse))
    
    data <- read_csv('data/loans.csv.gz', show_col_types = FALSE)
    data <- subset(data, select = -c(purpose))
    head(data,3)

    KNN with class

    Train and Test Split

    library(caTools)
    set.seed(255)
    split = sample.split(data$not_fully_paid, 
                         SplitRatio = 0.75)
    train = subset(data, 
                          split == TRUE)
    test = subset(data, 
                      split == FALSE)

    Feature Scaling

    train_scaled = scale(train[-13])
    test_scaled = scale(test[-13])

    Training KNN Classifier and Predicting

    library(class)
    test_pred <- knn(train = train_scaled, test = test_scaled,cl = train$not_fully_paid, k=10)

    Model Evaluation

    actual <- test$not_fully_paid
    
    cm <- table(actual,test_pred)
    cm
    accuracy <- sum(diag(cm))/length(actual)
    sprintf("Accuracy: %.2f%%", accuracy*100)

    KNN with caret

    Train and Test Split

    suppressPackageStartupMessages(library(caret))
    set.seed(255)
    
    data$not_fully_paid <- factor(data$not_fully_paid, levels = c(0, 1))
    
    trainIndex <- createDataPartition(data$not_fully_paid, 
    								  times=1, 
    								  p = .8, 
    								  list = FALSE)
    train <- data[trainIndex, ]
    test <- data[-trainIndex, ]