Understanding Confusion Matrix in R

This tutorial takes course material from DataCamp's Machine Learning Toolbox course and allows you to practice confusion matrices in R.

Oct 15, 2018 · 3 min read

If you want to take our Machine Learning Toolbox course, here is the link.

Calculate a confusion matrix

As you saw in the video, a confusion matrix is a very useful tool for calibrating the output of a model and examining all possible outcomes of your predictions (true positive, true negative, false positive, false negative).

Before you make your confusion matrix, you need to "cut" your predicted probabilities at a given threshold to turn probabilities into class predictions. You can do this easily with the ifelse() function, e.g.:

class_prediction <-
  ifelse(probability_prediction > 0.50,
         "positive_class",
         "negative_class"
  )

You could make such a contingency table with the table() function in base R, but confusionMatrix() in caret yields a lot of useful ancillary statistics in addition to the base rates in the table. You can calculate the confusion matrix (and the associated statistics) using the predicted outcomes as well as the actual outcomes, e.g.:

confusionMatrix(predicted, actual)

Instructions

Turn the numeric predictions p into a vector of class predictions called p_class, using a prediction cutoff of 0.50. Make sure to use "M" for the positive class and "R" for the negative class when making predictions, to match the classes in the original data.
Make a confusion matrix using p_class, the actual values in the test set, and the confusionMatrix() function.

eyJsYW5ndWFnZSI6InIgIiwicHJlX2V4ZXJjaXNlX2NvZGUiOiJcbmxpYnJh cnkoY2FyZXQpXG5saWJyYXJ5KG1sYmVuY2gpXG5kYXRhKFwiU29uYXJcIilc blxuZG93bmxvYWQuZmlsZShcImh0dHA6Ly9zMy5hbWF6b25hd3MuY29tL2Fz c2V0cy5kYXRhY2FtcC5jb20vcHJvZHVjdGlvbi9jb3Vyc2VfMTE2My9kYXRh c2V0cy9Tb25hcl90cmFpbi5yZHNcIixcbiAgICAgICAgICAgICAgZGVzdGZp bGUgPSBcIlNvbmFyX3RyYWluLnJkc1wiKVxuZG93bmxvYWQuZmlsZShcImh0 dHA6Ly9zMy5hbWF6b25hd3MuY29tL2Fzc2V0cy5kYXRhY2FtcC5jb20vcHJv ZHVjdGlvbi9jb3Vyc2VfMTE2My9kYXRhc2V0cy9Tb25hcl90ZXN0LnJkc1wi LFxuICAgICAgICAgICAgICBkZXN0ZmlsZSA9IFwiU29uYXJfdGVzdC5yZHNc IilcbnRyYWluIDwtIHJlYWRSRFMoXCJTb25hcl90cmFpbi5yZHNcIilcbnRl c3QgPC0gcmVhZFJEUyhcIlNvbmFyX3Rlc3QucmRzXCIpXG5cbmZpeF9jbGFz cyA8LSBmdW5jdGlvbih4KXtcbiAgaWZlbHNlKHggPT0gXCJNXCIsIDEsIDAp XG59XG50cmFpbltbXCJDbGFzc1wiXV0gPC0gZml4X2NsYXNzKHRyYWluW1tc IkNsYXNzXCJdXSlcblxuIyBGaXQgZ2xtIG1vZGVsXG5tb2RlbCA8LSBnbG0o Q2xhc3MgfiAuLCBmYW1pbHkgPSBcImJpbm9taWFsXCIsIHRyYWluKVxuXG4j IFByZWRpY3Qgb24gdGVzdFxucCA8LSBwcmVkaWN0KG1vZGVsLCB0ZXN0LCB0 eXBlID0gXCJyZXNwb25zZVwiKSIsInNhbXBsZSI6IlxuIyBJZiBwIGV4Y2Vl ZHMgdGhyZXNob2xkIG9mIDAuNSwgTSBlbHNlIFI6IG1fb3JfclxuXG5cbiMg Q29udmVydCB0byBmYWN0b3I6IHBfY2xhc3NcblxuXG4jIENyZWF0ZSBjb25m dXNpb24gbWF0cml4Iiwic29sdXRpb24iOiJcbiMgSWYgcCBleGNlZWRzIHRo cmVzaG9sZCBvZiAwLjUsIE0gZWxzZSBSOiBtX29yX3Jcbm1fb3JfciA8LSBp ZmVsc2UocCA+IDAuNSwgXCJNXCIsIFwiUlwiKVxuXG4jIENvbnZlcnQgdG8g ZmFjdG9yOiBwX2NsYXNzXG5wX2NsYXNzIDwtIGZhY3RvcihtX29yX3IsIGxl dmVscyA9IGxldmVscyh0ZXN0W1tcIkNsYXNzXCJdXSkpXG5cbiMgQ3JlYXRl IGNvbmZ1c2lvbiBtYXRyaXhcbmNvbmZ1c2lvbk1hdHJpeChwX2NsYXNzLCB0 ZXN0W1tcIkNsYXNzXCJdXSkiLCJzY3QiOiJcbnRlc3Rfb2JqZWN0KFwicF9j bGFzc1wiLCB1bmRlZmluZWRfbXNnID0gXCJEb24ndCBmb3JnZXQgdG8gY3Jl YXRlIGBwX2NsYXNzYCwgYSBmYWN0b3Igb2YgY2xhc3MgcHJlZGljdGlvbnMu XCIsIGluY29ycmVjdF9tc2cgPSBcIkRpZCB5b3UgY3JlYXRlIGBwX2NsYXNz YCB3aXRoIGxldmVscyBgJ00nYCBhbmQgYCdSJ2A/XCIpXG50ZXN0X2Z1bmN0 aW9uKFwiY29uZnVzaW9uTWF0cml4XCIsIGFyZ3MgPSBjKFwiZGF0YVwiLCBc InJlZmVyZW5jZVwiKSwgaW5jb3JyZWN0X21zZyA9IFwiRGlkIHlvdSBjcmVh dGUgdGhlIGNvbmZ1c2lvbiBtYXRyaXggYnkgY2FsbGluZyBgY29uZnVzaW9u TWF0cml4KClgIG9uIHRoZSBwcmVkaWN0ZWQgX2NsYXNzXyB2YWx1ZXMgYW5k IHRoZSBhY3R1YWwgdmFsdWVzP1wiKVxudGVzdF9lcnJvcigpXG5zdWNjZXNz X21zZyhcIkdyZWF0IHdvcmshXCIpIiwiaGludCI6IlxuLSBgYGlmZWxzZSgp YGAgdGFrZXMgMyBhcmd1bWVudHM6IGEgbG9naWNhbCBjb25kaXRpb24sIGEg dmFsdWUgdG8gdGFrZSB3aGVuIHRoZSBjb25kaXRpb24gaXMgbWV0LCBhbmQg YSB2YWx1ZSBmb3Igd2hlbiBpdCBpcyBub3QgbWV0LlxuLSBUaGUgZmFjdG9y IGxldmVscyBzaG91bGQgYmUgYGBcIk1cImBgIGFuZCBgYFwiUlwiYGAgKGlu IHRoYXQgb3JkZXIpLlxuLSBgYGNvbmZ1c2lvbk1hdHJpeCgpYGAgdGFrZXMg YHBfY2xhc3NgIGFuZCBgYHRlc3RbW1wiQ2xhc3NcIl1dYGAgYXMgYXJndW1l bnRzLiJ9

If that makes sense keep going to the next exercise! If not, here is an overview video.

Overview Video on Confusion Matrix

From probabilities to confusion matrix

Conversely, say you want to be really certain that your model correctly identifies all the mines as mines. In this case, you might use a prediction threshold of 0.10, instead of 0.90.

You can construct the confusion matrix in the same way you did before, using your new predicted classes:

pred <- ifelse(probability > threshold, "M", "R")

You can then call the confusionMatrix() function in the same way as in the previous exercise:

confusionMatrix(pred, actual)

eyJsYW5ndWFnZSI6InIgIiwicHJlX2V4ZXJjaXNlX2NvZGUiOiJcbmxpYnJh cnkoY2FyZXQpXG5saWJyYXJ5KG1sYmVuY2gpXG5kYXRhKFwiU29uYXJcIilc blxuZG93bmxvYWQuZmlsZShcImh0dHA6Ly9zMy5hbWF6b25hd3MuY29tL2Fz c2V0cy5kYXRhY2FtcC5jb20vcHJvZHVjdGlvbi9jb3Vyc2VfMTE2My9kYXRh c2V0cy9Tb25hcl90cmFpbi5yZHNcIixcbiAgICAgICAgICAgICAgZGVzdGZp bGUgPSBcIlNvbmFyX3RyYWluLnJkc1wiKVxuZG93bmxvYWQuZmlsZShcImh0 dHA6Ly9zMy5hbWF6b25hd3MuY29tL2Fzc2V0cy5kYXRhY2FtcC5jb20vcHJv ZHVjdGlvbi9jb3Vyc2VfMTE2My9kYXRhc2V0cy9Tb25hcl90ZXN0LnJkc1wi LFxuICAgICAgICAgICAgICBkZXN0ZmlsZSA9IFwiU29uYXJfdGVzdC5yZHNc IilcbnRyYWluIDwtIHJlYWRSRFMoXCJTb25hcl90cmFpbi5yZHNcIilcbnRl c3QgPC0gcmVhZFJEUyhcIlNvbmFyX3Rlc3QucmRzXCIpXG5cbmZpeF9jbGFz cyA8LSBmdW5jdGlvbih4KXtcbiAgaWZlbHNlKHggPT0gXCJNXCIsIDEsIDAp XG59XG50cmFpbltbXCJDbGFzc1wiXV0gPC0gZml4X2NsYXNzKHRyYWluW1tc IkNsYXNzXCJdXSlcblxuIyBGaXQgZ2xtIG1vZGVsXG5tb2RlbCA8LSBnbG0o Q2xhc3MgfiAuLCBmYW1pbHkgPSBcImJpbm9taWFsXCIsIHRyYWluKVxuXG4j IFByZWRpY3Qgb24gdGVzdFxucCA8LSBwcmVkaWN0KG1vZGVsLCB0ZXN0LCB0 eXBlID0gXCJyZXNwb25zZVwiKVxuIiwic2FtcGxlIjoiXG4jIElmIHAgZXhj ZWVkcyB0aHJlc2hvbGQgb2YgMC4xLCBNIGVsc2UgUjogbV9vcl9yXG5cblxu IyBDb252ZXJ0IHRvIGZhY3RvcjogcF9jbGFzc1xuXG5cbiMgQ3JlYXRlIGNv bmZ1c2lvbiBtYXRyaXhcbiIsInNvbHV0aW9uIjoiXG4jIElmIHAgZXhjZWVk cyB0aHJlc2hvbGQgb2YgMC4xLCBNIGVsc2UgUjogbV9vcl9yXG5tX29yX3Ig PC0gaWZlbHNlKHAgPiAwLjEsIFwiTVwiLCBcIlJcIilcblxuIyBDb252ZXJ0 IHRvIGZhY3RvcjogcF9jbGFzc1xucF9jbGFzcyA8LSBmYWN0b3IobV9vcl9y LCBsZXZlbHMgPSBsZXZlbHModGVzdFtbXCJDbGFzc1wiXV0pKVxuXG4jIENy ZWF0ZSBjb25mdXNpb24gbWF0cml4XG5jb25mdXNpb25NYXRyaXgocF9jbGFz cywgdGVzdFtbXCJDbGFzc1wiXV0pIiwic2N0IjoiXG50ZXN0X29iamVjdChc InBfY2xhc3NcIiwgdW5kZWZpbmVkX21zZyA9IFwiRG9uJ3QgZm9yZ2V0IHRv IGNyZWF0ZSBgcF9jbGFzc2AsIGEgZmFjdG9yIG9mIGNsYXNzIHByZWRpY3Rp b25zLlwiLCBpbmNvcnJlY3RfbXNnID0gXCJEaWQgeW91IGNyZWF0ZSBgcF9j bGFzc2Agd2l0aCBsZXZlbHMgYCdNJ2AgYW5kIGAnUidgP1wiKVxudGVzdF9m dW5jdGlvbihcImNvbmZ1c2lvbk1hdHJpeFwiLCBhcmdzID0gYyhcImRhdGFc IiwgXCJyZWZlcmVuY2VcIiksIGluY29ycmVjdF9tc2cgPSBcIkRpZCB5b3Ug Y3JlYXRlIHRoZSBjb25mdXNpb24gbWF0cml4IGJ5IGNhbGxpbmcgYGNvbmZ1 c2lvbk1hdHJpeCgpYCBvbiB0aGUgcHJlZGljdGVkIF9jbGFzc18gdmFsdWVz IGFuZCB0aGUgYWN0dWFsIHZhbHVlcz9cIilcbnRlc3RfZXJyb3IoKVxuc3Vj Y2Vzc19tc2coXCJBd2Vzb21lISBOb3RlIHRoYXQgdGhlcmUgYXJlIChzbGln aHRseSkgbW9yZSBwcmVkaWN0ZWQgbWluZXMgd2l0aCB0aGlzIGxvd2VyIHRo cmVzaG9sZDogNTggKDQwICsgMTgpIGFzIGNvbXBhcmVkIHRvIDQ3IGZvciB0 aGUgMC41MCB0aHJlc2hvbGQuXCIpIiwiaGludCI6IlxuLSBgYGlmZWxzZSgp YGAgdGFrZXMgMyBhcmd1bWVudHM6IGEgbG9naWNhbCBjb25kaXRpb24sIGEg dmFsdWUgdG8gdGFrZSB3aGVuIHRoZSBjb25kaXRpb24gaXMgbWV0LCBhbmQg YSB2YWx1ZSBmb3Igd2hlbiBpdCBpcyBub3QgbWV0LlxuLSBUaGUgZmFjdG9y IGxldmVscyBzaG91bGQgYmUgYGBcIk1cImBgIGFuZCBgYFwiUlwiYGAgKGlu IHRoYXQgb3JkZXIpLlxuLSBgYGNvbmZ1c2lvbk1hdHJpeCgpYGAgdGFrZXMg YHBfY2xhc3NgIGFuZCBgYHRlc3RbW1wiQ2xhc3NcIl1dYGAgYXMgYXJndW1l bnRzLiJ9

If you want to learn more from this course, here is the link.

Topics

Data Science

R Courses

Course

Machine Learning with caret in R

4 hr

59.9K

This course teaches the big ideas in machine learning like how to build and evaluate predictive models.

See Details

Start Course

Course

Introduction to Regression in R

4 hr

71.5K

Predict housing prices and ad click-through rate by implementing, analyzing, and interpreting regression analysis in R.

See Details

Start Course

Course

Intermediate Regression in R

4 hr

32.8K

Learn to perform linear and logistic regression with multiple explanatory variables.

See Details

Start Course

Tutorial

Introduction to Data frames in R

This tutorial takes course material from DataCamp's Introduction to R course and allows you to practice data frames.

Ryan Sheehy

Tutorial

Matrices in R Tutorial

Learn all about R's matrix, naming rows and columns, accessing elements also with computation like addition, subtraction, multiplication, and division.

Olivia Smith

Tutorial

Data Frames in R

This tutorial takes course material from DataCamp's Introduction to R for Finance course and allows you to practice Data Frames.

Ryan Sheehy

Tutorial

What is A Confusion Matrix in Machine Learning? The Model Evaluation Tool Explained

See how a confusion matrix categorizes model predictions into True Positives, False Positives, True Negatives, and False Negatives. Keep reading to understand its structure, calculation steps, and uses for handling imbalanced data and error analysis.

Nisha Arya Ahmed

Tutorial

Machine Learning in R for beginners

This small tutorial is meant to introduce you to the basics of machine learning in R: it will show you how to use R to work with KNN.

Karlijn Willems

Tutorial

Support Vector Machines in R

In this tutorial, you'll try to gain a high-level understanding of how SVMs work and then implement them using R.

James Le

See More See More

Calculate a confusion matrix

Instructions

Overview Video on Confusion Matrix

From probabilities to confusion matrix

Introduction to Data frames in R

Matrices in R Tutorial

Data Frames in R

What is A Confusion Matrix in Machine Learning? The Model Evaluation Tool Explained

Machine Learning in R for beginners

Support Vector Machines in R

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Machine Learning with caret in R

Introduction to Regression in R

Intermediate Regression in R

Introduction to Data frames in R

Matrices in R Tutorial

Data Frames in R

What is A Confusion Matrix in Machine Learning? The Model Evaluation Tool Explained

Machine Learning in R for beginners

Support Vector Machines in R

Machine Learning with caret in R