If you want to take our Machine Learning Toolbox course, here is the link.
Calculate a confusion matrix
As you saw in the video, a confusion matrix is a very useful tool for calibrating the output of a model and examining all possible outcomes of your predictions (true positive, true negative, false positive, false negative).
Before you make your confusion matrix, you need to "cut" your predicted probabilities at a given threshold to turn probabilities into class predictions. You can do this easily with the ifelse()
function, e.g.:
class_prediction <-
ifelse(probability_prediction > 0.50,
"positive_class",
"negative_class"
)
You could make such a contingency table with the table()
function in base R, but confusionMatrix()
in caret
yields a lot of useful ancillary statistics in addition to the base rates in the table. You can calculate the confusion matrix (and the associated statistics) using the predicted outcomes as well as the actual outcomes, e.g.:
confusionMatrix(predicted, actual)
Instructions
- Turn the numeric predictions
p
into a vector of class predictions called p_class
, using a prediction cutoff of 0.50. Make sure to use "M"
for the positive class and "R"
for the negative class when making predictions, to match the classes in the original data.
- Make a confusion matrix using
p_class
, the actual values in the test
set, and the confusionMatrix()
function.
eyJsYW5ndWFnZSI6InIgIiwicHJlX2V4ZXJjaXNlX2NvZGUiOiJcbmxpYnJh
cnkoY2FyZXQpXG5saWJyYXJ5KG1sYmVuY2gpXG5kYXRhKFwiU29uYXJcIilc
blxuZG93bmxvYWQuZmlsZShcImh0dHA6Ly9zMy5hbWF6b25hd3MuY29tL2Fz
c2V0cy5kYXRhY2FtcC5jb20vcHJvZHVjdGlvbi9jb3Vyc2VfMTE2My9kYXRh
c2V0cy9Tb25hcl90cmFpbi5yZHNcIixcbiAgICAgICAgICAgICAgZGVzdGZp
bGUgPSBcIlNvbmFyX3RyYWluLnJkc1wiKVxuZG93bmxvYWQuZmlsZShcImh0
dHA6Ly9zMy5hbWF6b25hd3MuY29tL2Fzc2V0cy5kYXRhY2FtcC5jb20vcHJv
ZHVjdGlvbi9jb3Vyc2VfMTE2My9kYXRhc2V0cy9Tb25hcl90ZXN0LnJkc1wi
LFxuICAgICAgICAgICAgICBkZXN0ZmlsZSA9IFwiU29uYXJfdGVzdC5yZHNc
IilcbnRyYWluIDwtIHJlYWRSRFMoXCJTb25hcl90cmFpbi5yZHNcIilcbnRl
c3QgPC0gcmVhZFJEUyhcIlNvbmFyX3Rlc3QucmRzXCIpXG5cbmZpeF9jbGFz
cyA8LSBmdW5jdGlvbih4KXtcbiAgaWZlbHNlKHggPT0gXCJNXCIsIDEsIDAp
XG59XG50cmFpbltbXCJDbGFzc1wiXV0gPC0gZml4X2NsYXNzKHRyYWluW1tc
IkNsYXNzXCJdXSlcblxuIyBGaXQgZ2xtIG1vZGVsXG5tb2RlbCA8LSBnbG0o
Q2xhc3MgfiAuLCBmYW1pbHkgPSBcImJpbm9taWFsXCIsIHRyYWluKVxuXG4j
IFByZWRpY3Qgb24gdGVzdFxucCA8LSBwcmVkaWN0KG1vZGVsLCB0ZXN0LCB0
eXBlID0gXCJyZXNwb25zZVwiKSIsInNhbXBsZSI6IlxuIyBJZiBwIGV4Y2Vl
ZHMgdGhyZXNob2xkIG9mIDAuNSwgTSBlbHNlIFI6IG1fb3JfclxuXG5cbiMg
Q29udmVydCB0byBmYWN0b3I6IHBfY2xhc3NcblxuXG4jIENyZWF0ZSBjb25m
dXNpb24gbWF0cml4Iiwic29sdXRpb24iOiJcbiMgSWYgcCBleGNlZWRzIHRo
cmVzaG9sZCBvZiAwLjUsIE0gZWxzZSBSOiBtX29yX3Jcbm1fb3JfciA8LSBp
ZmVsc2UocCA+IDAuNSwgXCJNXCIsIFwiUlwiKVxuXG4jIENvbnZlcnQgdG8g
ZmFjdG9yOiBwX2NsYXNzXG5wX2NsYXNzIDwtIGZhY3RvcihtX29yX3IsIGxl
dmVscyA9IGxldmVscyh0ZXN0W1tcIkNsYXNzXCJdXSkpXG5cbiMgQ3JlYXRl
IGNvbmZ1c2lvbiBtYXRyaXhcbmNvbmZ1c2lvbk1hdHJpeChwX2NsYXNzLCB0
ZXN0W1tcIkNsYXNzXCJdXSkiLCJzY3QiOiJcbnRlc3Rfb2JqZWN0KFwicF9j
bGFzc1wiLCB1bmRlZmluZWRfbXNnID0gXCJEb24ndCBmb3JnZXQgdG8gY3Jl
YXRlIGBwX2NsYXNzYCwgYSBmYWN0b3Igb2YgY2xhc3MgcHJlZGljdGlvbnMu
XCIsIGluY29ycmVjdF9tc2cgPSBcIkRpZCB5b3UgY3JlYXRlIGBwX2NsYXNz
YCB3aXRoIGxldmVscyBgJ00nYCBhbmQgYCdSJ2A/XCIpXG50ZXN0X2Z1bmN0
aW9uKFwiY29uZnVzaW9uTWF0cml4XCIsIGFyZ3MgPSBjKFwiZGF0YVwiLCBc
InJlZmVyZW5jZVwiKSwgaW5jb3JyZWN0X21zZyA9IFwiRGlkIHlvdSBjcmVh
dGUgdGhlIGNvbmZ1c2lvbiBtYXRyaXggYnkgY2FsbGluZyBgY29uZnVzaW9u
TWF0cml4KClgIG9uIHRoZSBwcmVkaWN0ZWQgX2NsYXNzXyB2YWx1ZXMgYW5k
IHRoZSBhY3R1YWwgdmFsdWVzP1wiKVxudGVzdF9lcnJvcigpXG5zdWNjZXNz
X21zZyhcIkdyZWF0IHdvcmshXCIpIiwiaGludCI6IlxuLSBgYGlmZWxzZSgp
YGAgdGFrZXMgMyBhcmd1bWVudHM6IGEgbG9naWNhbCBjb25kaXRpb24sIGEg
dmFsdWUgdG8gdGFrZSB3aGVuIHRoZSBjb25kaXRpb24gaXMgbWV0LCBhbmQg
YSB2YWx1ZSBmb3Igd2hlbiBpdCBpcyBub3QgbWV0LlxuLSBUaGUgZmFjdG9y
IGxldmVscyBzaG91bGQgYmUgYGBcIk1cImBgIGFuZCBgYFwiUlwiYGAgKGlu
IHRoYXQgb3JkZXIpLlxuLSBgYGNvbmZ1c2lvbk1hdHJpeCgpYGAgdGFrZXMg
YHBfY2xhc3NgIGFuZCBgYHRlc3RbW1wiQ2xhc3NcIl1dYGAgYXMgYXJndW1l
bnRzLiJ9
If that makes sense keep going to the next exercise! If not, here is an overview video.
Overview Video on Confusion Matrix
From probabilities to confusion matrix
Conversely, say you want to be really certain that your model correctly identifies all the mines as mines. In this case, you might use a prediction threshold of 0.10, instead of 0.90.
You can construct the confusion matrix in the same way you did before, using your new predicted classes:
pred <- ifelse(probability > threshold, "M", "R")
You can then call the confusionMatrix()
function in the same way as in the previous exercise:
confusionMatrix(pred, actual)
eyJsYW5ndWFnZSI6InIgIiwicHJlX2V4ZXJjaXNlX2NvZGUiOiJcbmxpYnJh
cnkoY2FyZXQpXG5saWJyYXJ5KG1sYmVuY2gpXG5kYXRhKFwiU29uYXJcIilc
blxuZG93bmxvYWQuZmlsZShcImh0dHA6Ly9zMy5hbWF6b25hd3MuY29tL2Fz
c2V0cy5kYXRhY2FtcC5jb20vcHJvZHVjdGlvbi9jb3Vyc2VfMTE2My9kYXRh
c2V0cy9Tb25hcl90cmFpbi5yZHNcIixcbiAgICAgICAgICAgICAgZGVzdGZp
bGUgPSBcIlNvbmFyX3RyYWluLnJkc1wiKVxuZG93bmxvYWQuZmlsZShcImh0
dHA6Ly9zMy5hbWF6b25hd3MuY29tL2Fzc2V0cy5kYXRhY2FtcC5jb20vcHJv
ZHVjdGlvbi9jb3Vyc2VfMTE2My9kYXRhc2V0cy9Tb25hcl90ZXN0LnJkc1wi
LFxuICAgICAgICAgICAgICBkZXN0ZmlsZSA9IFwiU29uYXJfdGVzdC5yZHNc
IilcbnRyYWluIDwtIHJlYWRSRFMoXCJTb25hcl90cmFpbi5yZHNcIilcbnRl
c3QgPC0gcmVhZFJEUyhcIlNvbmFyX3Rlc3QucmRzXCIpXG5cbmZpeF9jbGFz
cyA8LSBmdW5jdGlvbih4KXtcbiAgaWZlbHNlKHggPT0gXCJNXCIsIDEsIDAp
XG59XG50cmFpbltbXCJDbGFzc1wiXV0gPC0gZml4X2NsYXNzKHRyYWluW1tc
IkNsYXNzXCJdXSlcblxuIyBGaXQgZ2xtIG1vZGVsXG5tb2RlbCA8LSBnbG0o
Q2xhc3MgfiAuLCBmYW1pbHkgPSBcImJpbm9taWFsXCIsIHRyYWluKVxuXG4j
IFByZWRpY3Qgb24gdGVzdFxucCA8LSBwcmVkaWN0KG1vZGVsLCB0ZXN0LCB0
eXBlID0gXCJyZXNwb25zZVwiKVxuIiwic2FtcGxlIjoiXG4jIElmIHAgZXhj
ZWVkcyB0aHJlc2hvbGQgb2YgMC4xLCBNIGVsc2UgUjogbV9vcl9yXG5cblxu
IyBDb252ZXJ0IHRvIGZhY3RvcjogcF9jbGFzc1xuXG5cbiMgQ3JlYXRlIGNv
bmZ1c2lvbiBtYXRyaXhcbiIsInNvbHV0aW9uIjoiXG4jIElmIHAgZXhjZWVk
cyB0aHJlc2hvbGQgb2YgMC4xLCBNIGVsc2UgUjogbV9vcl9yXG5tX29yX3Ig
PC0gaWZlbHNlKHAgPiAwLjEsIFwiTVwiLCBcIlJcIilcblxuIyBDb252ZXJ0
IHRvIGZhY3RvcjogcF9jbGFzc1xucF9jbGFzcyA8LSBmYWN0b3IobV9vcl9y
LCBsZXZlbHMgPSBsZXZlbHModGVzdFtbXCJDbGFzc1wiXV0pKVxuXG4jIENy
ZWF0ZSBjb25mdXNpb24gbWF0cml4XG5jb25mdXNpb25NYXRyaXgocF9jbGFz
cywgdGVzdFtbXCJDbGFzc1wiXV0pIiwic2N0IjoiXG50ZXN0X29iamVjdChc
InBfY2xhc3NcIiwgdW5kZWZpbmVkX21zZyA9IFwiRG9uJ3QgZm9yZ2V0IHRv
IGNyZWF0ZSBgcF9jbGFzc2AsIGEgZmFjdG9yIG9mIGNsYXNzIHByZWRpY3Rp
b25zLlwiLCBpbmNvcnJlY3RfbXNnID0gXCJEaWQgeW91IGNyZWF0ZSBgcF9j
bGFzc2Agd2l0aCBsZXZlbHMgYCdNJ2AgYW5kIGAnUidgP1wiKVxudGVzdF9m
dW5jdGlvbihcImNvbmZ1c2lvbk1hdHJpeFwiLCBhcmdzID0gYyhcImRhdGFc
IiwgXCJyZWZlcmVuY2VcIiksIGluY29ycmVjdF9tc2cgPSBcIkRpZCB5b3Ug
Y3JlYXRlIHRoZSBjb25mdXNpb24gbWF0cml4IGJ5IGNhbGxpbmcgYGNvbmZ1
c2lvbk1hdHJpeCgpYCBvbiB0aGUgcHJlZGljdGVkIF9jbGFzc18gdmFsdWVz
IGFuZCB0aGUgYWN0dWFsIHZhbHVlcz9cIilcbnRlc3RfZXJyb3IoKVxuc3Vj
Y2Vzc19tc2coXCJBd2Vzb21lISBOb3RlIHRoYXQgdGhlcmUgYXJlIChzbGln
aHRseSkgbW9yZSBwcmVkaWN0ZWQgbWluZXMgd2l0aCB0aGlzIGxvd2VyIHRo
cmVzaG9sZDogNTggKDQwICsgMTgpIGFzIGNvbXBhcmVkIHRvIDQ3IGZvciB0
aGUgMC41MCB0aHJlc2hvbGQuXCIpIiwiaGludCI6IlxuLSBgYGlmZWxzZSgp
YGAgdGFrZXMgMyBhcmd1bWVudHM6IGEgbG9naWNhbCBjb25kaXRpb24sIGEg
dmFsdWUgdG8gdGFrZSB3aGVuIHRoZSBjb25kaXRpb24gaXMgbWV0LCBhbmQg
YSB2YWx1ZSBmb3Igd2hlbiBpdCBpcyBub3QgbWV0LlxuLSBUaGUgZmFjdG9y
IGxldmVscyBzaG91bGQgYmUgYGBcIk1cImBgIGFuZCBgYFwiUlwiYGAgKGlu
IHRoYXQgb3JkZXIpLlxuLSBgYGNvbmZ1c2lvbk1hdHJpeCgpYGAgdGFrZXMg
YHBfY2xhc3NgIGFuZCBgYHRlc3RbW1wiQ2xhc3NcIl1dYGAgYXMgYXJndW1l
bnRzLiJ9
If you want to learn more from this course, here is the link.