If you want to take our Machine Learning Toolbox course, here is the link.
Calculate a confusion matrix
As you saw in the video, a confusion matrix is a very useful tool for calibrating the output of a model and examining all possible outcomes of your predictions (true positive, true negative, false positive, false negative).
Before you make your confusion matrix, you need to "cut" your predicted probabilities at a given threshold to turn probabilities into class predictions. You can do this easily with the
ifelse() function, e.g.:
class_prediction <- ifelse(probability_prediction > 0.50, "positive_class", "negative_class" )
You could make such a contingency table with the
table() function in base R, but
caret yields a lot of useful ancillary statistics in addition to the base rates in the table. You can calculate the confusion matrix (and the associated statistics) using the predicted outcomes as well as the actual outcomes, e.g.:
- Turn the numeric predictions
pinto a vector of class predictions called
p_class, using a prediction cutoff of 0.50. Make sure to use
"M"for the positive class and
"R"for the negative class when making predictions, to match the classes in the original data.
- Make a confusion matrix using
p_class, the actual values in the
testset, and the
If that makes sense keep going to the next exercise! If not, here is an overview video.
Overview Video on Confusion Matrix
From probabilities to confusion matrix
Conversely, say you want to be really certain that your model correctly identifies all the mines as mines. In this case, you might use a prediction threshold of 0.10, instead of 0.90.
You can construct the confusion matrix in the same way you did before, using your new predicted classes:
pred <- ifelse(probability > threshold, "M", "R")
You can then call the
confusionMatrix() function in the same way as in the previous exercise:
If you want to learn more from this course, here is the link.