Skip to content

Supervised Learning in R: Classification

Course Notes

Use this workspace to take notes, store code snippets, and build your own interactive cheatsheet!

Note that the data from the course is not yet added to this workspace. You will need to navigate to the course overview page, download any data you wish to use, and add it to the file browser.

library(tidyverse)


Ch. 1 - k-Nearest Neighbors (kNN)

# Import any packages you want to use here
library(class)
data <- read_csv("Traffic-Sign-Image-Data .csv", show_col_types = F)
signs <- filter(data, sample == "train") %>% select(-c(id, sample))
signs_test <- filter(data, sample == "test") %>% select(-c(id, sample))
sign_types <- signs$sign_type
signs_actual <- signs_test$sign_type
# How many classes are being predicted?
table(signs_actual)
cat("Total classes in sings_actual =", length(signs_actual))

Classifying a collection of road signs

# Use kNN to identify the test road signs
sign_types <- signs$sign_type
signs_pred <- knn(train = signs[-1], test = signs_test[-1], cl = sign_types)

# Create a confusion matrix of the predicted versus actual values
signs_actual <- signs_test$sign_type
table(signs_pred, signs_actual)

# Compute the accuracy
mean(signs_pred == signs_actual)

Testing other 'k' values

# Compute the accuracy of the baseline model (default k = 1)
k_1 <- knn(train = signs[-1], test = signs_test[-1], cl = sign_types)
mean(k_1 == signs_actual)

# Modify the above to set k = 7
k_7 <- knn(train = signs[-1], test = signs_test[-1], cl = sign_types, k=7)
mean(k_7 == signs_actual)

# Set k = 15 and compare to the above
k_15 <- knn(train = signs[-1], test = signs_test[-1], cl = sign_types, k=15)
mean(k_15 == signs_actual)

Seeing How the Neighbors Voted

  • Build a kNN model with the prob = TRUE parameter to compute the vote proportions. Set k = 7.

  • Use the attr() function to obtain the vote proportions for the predicted class. These are stored in the attribute "prob".

  • Examine the first several vote outcomes and percentages using the head() function to see how the confidence varies from sign to sign.

# Use the prob parameter to get the proportion of votes for the winning class
sign_pred <- knn(train = signs[-1], test = signs_test[-1], cl = sign_types, prob = TRUE, k = 7)

# Get the "prob" attribute from the predicted classes
sign_prob <- attr(sign_pred, "prob")

# Examine the first several predictions
head(sign_pred)

# Examine the proportion of votes for the winning class
head(sign_prob)