Tutorials
finance
+1

Ordering and Subsetting Factors

This tutorial takes course material from DataCamp's Introduction to R for Finance course and allows you to practice ordering and subsetting factors.


If you want to take our Introduction to R for Finance course, here is the link.

Create an ordered factor

Look at the plot created over on the right. It looks great, but look at the order of the bars! No order was specified when you created the factor, so, when R tried to plot it, it just placed the levels in alphabetical order. By now, you know that there is an order to credit ratings, and your plots should reflect that!

As a reminder, the order of credit ratings from least risky to most risky is:

AAA, AA, A, BBB, BB, B, CCC, CC, C, D

To order your factor, there are two options.

When creating a factor, specify ordered = TRUE and add unique levels in order from least to greatest:

credit_rating <- c("AAA", "AA", "A", "BBB", "AA", "BBB", "A")

credit_factor_ordered <- factor(credit_rating, ordered = TRUE,
                                levels = c("AAA", "AA", "A", "BBB"))

For an existing unordered factor like credit_factor, use the ordered() function:

ordered(credit_factor, levels = c("AAA", "AA", "A", "BBB"))

Both ways result in:

credit_factor_ordered

[1] AAA AA  A   BBB AA  BBB A  
Levels: AAA < AA < A < BBB

Notice the < specifying the order of the levels that was not there before!

Instructions

  • The character vector credit_rating is in your workspace.
  • Use the unique() function with credit_rating to print only the unique words in the character vector. These will be your levels.
  • Use factor() to create an ordered factor for credit_rating and store it as credit_factor_ordered. Make sure to list the levels from least to greatest in terms of risk!
  • Plot credit_factor_ordered and note the new order of the bars.
eyJsYW5ndWFnZSI6InIgIiwicHJlX2V4ZXJjaXNlX2NvZGUiOiJcbmNyZWRp dF9yYXRpbmcgPC0gYyhcIkJCXCIsIFwiQUFBXCIsIFwiQUFcIiwgXCJDQ0Nc IiwgXCJBQVwiLCBcIkFBQVwiLCBcIkJcIiwgXCJCQlwiKVxuY3JlZGl0X2Zh Y3RvciA8LSBmYWN0b3IoY3JlZGl0X3JhdGluZylcblxucGxvdChjcmVkaXRf ZmFjdG9yKSIsInNhbXBsZSI6IlxuIyBVc2UgdW5pcXVlKCkgdG8gZmluZCB1 bmlxdWUgd29yZHNcbnVuaXF1ZShfX18pXG5cbiMgQ3JlYXRlIGFuIG9yZGVy ZWQgZmFjdG9yXG5jcmVkaXRfZmFjdG9yX29yZGVyZWQgPC0gZmFjdG9yKF9f Xywgb3JkZXJlZCA9IF9fXywgbGV2ZWxzID0gYyhfX18pKVxuXG4jIFBsb3Qg Y3JlZGl0X2ZhY3Rvcl9vcmRlcmVkIiwic29sdXRpb24iOiJcbiMgVXNlIHVu aXF1ZSgpIHRvIGZpbmQgdW5pcXVlIHdvcmRzXG51bmlxdWUoY3JlZGl0X3Jh dGluZylcblxuIyBDcmVhdGUgYW4gb3JkZXJlZCBmYWN0b3JcbmNyZWRpdF9m YWN0b3Jfb3JkZXJlZCA8LSBmYWN0b3IoY3JlZGl0X3JhdGluZywgb3JkZXJl ZCA9IFRSVUUsIGxldmVscyA9IGMoXCJBQUFcIiwgXCJBQVwiLCBcIkJCXCIs IFwiQlwiLCBcIkNDQ1wiKSlcblxuIyBQbG90IGNyZWRpdF9mYWN0b3Jfb3Jk ZXJlZFxucGxvdChjcmVkaXRfZmFjdG9yX29yZGVyZWQpIiwic2N0IjoiXG50 ZXN0X3ByZWRlZmluZWRfb2JqZWN0cyhcImNyZWRpdF9yYXRpbmdcIilcblxu dGVzdF9mdW5jdGlvbihcInVuaXF1ZVwiLCBhcmdzID0gXCJ4XCIsXG4gICAg ICAgICAgICAgIGluY29ycmVjdF9tc2cgPSBcIkRpZCB5b3UgdXNlIGB1bmlx dWUoKWAgdG8gc2VlIHRoZSB1bmlxdWUgd29yZHMgaW4gYGNyZWRpdF9yYXRp bmdgP1wiKVxuICAgICAgICAgICAgICBcbnRlc3RfZnVuY3Rpb24oXCJmYWN0 b3JcIiwgYXJncyA9IGMoXCJ4XCIsIFwib3JkZXJlZFwiLCBcImxldmVsc1wi KSwgXG4gICAgICAgICAgICAgIGluY29ycmVjdF9tc2cgPSBcIkFyZSB0aGUg YXJndW1lbnRzIG9mIGBmYWN0b3IoKWAgc3BlY2lmaWVkIGNvcnJlY3RseT9c IilcbiAgICAgICAgICAgICAgXG50ZXN0X29iamVjdChcImNyZWRpdF9mYWN0 b3Jfb3JkZXJlZFwiLCBcbiAgICAgICAgICAgIGluY29ycmVjdF9tc2cgPSBc IkhtbSwgc29tZXRoaW5nIGlzbid0IHJpZ2h0IHdpdGggYGNyZWRpdF9mYWN0 b3Jfb3JkZXJlZGAuIEFyZSB0aGUgYXJndW1lbnRzIG9mIGBmYWN0b3IoKWAg Y29ycmVjdGx5IHNwZWNpZmllZD8gUGF5IGF0dGVudGlvbiB0byB0aGUgZXhh bXBsZSFcIilcbiAgICAgICAgICAgIFxudGVzdF9mdW5jdGlvbihcInBsb3Rc IiwgYXJncyA9IFwieFwiKVxuXG5zdWNjZXNzX21zZyhcIkF3ZXNvbWUhIE9y ZGVyZWQgZmFjdG9ycyBhcmUgZ3JlYXQgZm9yIHBsb3R0aW5nIG9yIGNyZWF0 aW5nIHRhYmxlcyB3aXRoIGEgcHJlZGVmaW5lZCBvcmRlci5cIilcbiIsImhp bnQiOiJcblJlbWVtYmVyIHRoZSBvcmRlciBvZiBjcmVkaXQgcmF0aW5nIGZy b20gbGVhc3QgdG8gZ3JlYXRlc3QgaXM6XG5cbkFBQSwgQUEsIEEsIEJCQiwg QkIsIEIsIENDQywgQ0MsIEMsIEQifQ==

If that makes sense keep going to the next exercise! If not, here is an overview video.

Overview video ordering and subsetting factors.

Subsetting a factor

You can subset factors in a similar way that you subset vectors. As usual, [ ] is the key! However, R has some interesting behavior when you want to remove a factor level from your analysis. For example, what if you wanted to remove the AAA bond from your portfolio?

credit_factor

[1] AAA AA  A   BBB AA  BBB A  
Levels: BBB < A < AA < AAA

credit_factor[-1]

[1] AA  A   BBB AA  BBB A  
Levels: BBB < A < AA < AAA

R removed the AAA bond at the first position, but left the AAA level behind! If you were to plot this, you would end up with the bar chart over to the right. A better plan would have been to tell R to drop the AAA level entirely. To do that, add drop = TRUE:

credit_factor[-1, drop = TRUE]

[1] AA  A   BBB AA  BBB A  
Levels: BBB < A < AA

That's what you wanted!

Instructions

  • Using the same data, remove the "A" bonds from positions 3 and 7 of credit_factor. For now, do not use drop = TRUE. Assign this to keep_level.
  • Plot keep_level.
  • Now, remove "A" from credit_factor again, but this time use drop = TRUE. Assign this to drop_level.
  • Plot drop_level.
eyJsYW5ndWFnZSI6InIgIiwicHJlX2V4ZXJjaXNlX2NvZGUiOiJcbmNyZWRp dF9yYXRpbmcgPC0gYyhcIkFBQVwiLCBcIkFBXCIsIFwiQVwiLCBcIkJCQlwi LCBcIkFBXCIsIFwiQkJCXCIsIFwiQVwiKVxuXG5jcmVkaXRfZmFjdG9yIDwt IGZhY3RvcihjcmVkaXRfcmF0aW5nLCBvcmRlcmVkID0gVFJVRSwgbGV2ZWxz ID0gYyhcIkJCQlwiLCBcIkFcIiwgXCJBQVwiLCBcIkFBQVwiKSlcblxucGxv dChjcmVkaXRfZmFjdG9yWy0xXSkiLCJzYW1wbGUiOiJcbiMgUmVtb3ZlIHRo ZSBBIGJvbmRzIGF0IHBvc2l0aW9ucyAzIGFuZCA3LiBEb24ndCBkcm9wIHRo ZSBBIGxldmVsLlxua2VlcF9sZXZlbCA8LSBcblxuIyBQbG90IGtlZXBfbGV2 ZWxcblxuXG4jIFJlbW92ZSB0aGUgQSBib25kcyBhdCBwb3NpdGlvbnMgMyBh bmQgNy4gRHJvcCB0aGUgQSBsZXZlbC5cbmRyb3BfbGV2ZWwgPC1cblxuIyBQ bG90IGRyb3BfbGV2ZWwiLCJzb2x1dGlvbiI6IlxuIyBSZW1vdmUgdGhlIEEg Ym9uZHMgYXQgcG9zaXRpb25zIDMgYW5kIDcuIERvbid0IGRyb3AgdGhlIEEg bGV2ZWwuXG5rZWVwX2xldmVsIDwtIGNyZWRpdF9mYWN0b3JbLWMoMyw3KV1c blxuIyBQbG90IGtlZXBfbGV2ZWxcbnBsb3Qoa2VlcF9sZXZlbClcblxuIyBS ZW1vdmUgdGhlIEEgYm9uZHMgYXQgcG9zaXRpb25zIDMgYW5kIDcuIERyb3Ag dGhlIEEgbGV2ZWwuXG5kcm9wX2xldmVsIDwtIGNyZWRpdF9mYWN0b3JbLWMo Myw3KSwgZHJvcCA9IFRSVUVdXG5cbiMgUGxvdCBkcm9wX2xldmVsXG5wbG90 KGRyb3BfbGV2ZWwpXG4iLCJzY3QiOiJcbnRlc3Rfb2JqZWN0KFwia2VlcF9s ZXZlbFwiLCBlcV9jb25kaXRpb24gPSBcImVxdWFsXCIsXG4gICAgICAgICAg ICBpbmNvcnJlY3RfbXNnID0gXCJEaWQgeW91IGNvcnJlY3RseSByZW1vdmUg anVzdCBwb3NpdGlvbnMgMyBhbmQgNz8gRGlkIHlvdSB1c2UgYSBgLWAgYW5k IGBjKClgIGluc2lkZSB0aGUgYnJhY2tldHMgdG8gcmVtb3ZlIHRoZW0/XCIp XG4gICAgICAgICAgICBcbnRlc3RfZnVuY3Rpb24oXCJwbG90XCIsIGFyZ3Mg PSBcInhcIiwgaW5kZXggPSAxLFxuICAgICAgICAgICAgICBpbmNvcnJlY3Rf bXNnID0gXCJEaWQgeW91IGBwbG90KClgIHRoZSBga2VlcF9sZXZlbGAgZmFj dG9yP1wiKVxuICAgICAgICAgICAgICBcbnRlc3Rfb2JqZWN0KFwiZHJvcF9s ZXZlbFwiLCBlcV9jb25kaXRpb24gPSBcImVxdWFsXCIsXG4gICAgICAgICAg ICBpbmNvcnJlY3RfbXNnID0gXCJEaWQgeW91IGNvcnJlY3RseSByZW1vdmUg anVzdCBwb3NpdGlvbnMgMyBhbmQgNz8gRGlkIHlvdSB1c2UgYSBgLWAgYW5k IGBjKClgIGluc2lkZSB0aGUgYnJhY2tldHMgdG8gcmVtb3ZlIHRoZW0/IERp ZCB5b3UgYWxzbyB1c2UgYGRyb3AgPSBUUlVFYCB0aGlzIHRpbWU/XCIpXG4g ICAgICAgICAgICBcbnRlc3RfZnVuY3Rpb24oXCJwbG90XCIsIGFyZ3MgPSBc InhcIiwgaW5kZXggPSAyLFxuICAgICAgICAgICAgICBpbmNvcnJlY3RfbXNn ID0gXCJEaWQgeW91IGBwbG90KClgIHRoZSBgZHJvcF9sZXZlbGAgZmFjdG9y P1wiKSAgXG4gICAgICAgICAgICAgIFxuc3VjY2Vzc19tc2coXCJHcmVhdCEg VGhlIGBkcm9wYCBhcmd1bWVudCB3aWxsIGhlbHAgeW91IGdldCByaWQgb2Yg dGhvc2UgcGVza3kgZmFjdG9yIGxldmVscyB0aGF0IHN0aWNrIGFyb3VuZC5c IilcbiIsImhpbnQiOiJcbk9uZSB3YXkgdG8gZHJvcCBwb3NpdGlvbnMgMyBh bmQgNyBpcyB0byBkbyBgYGNyZWRpdF9mYWN0b3JbLWMoMyw3KV1gYC4ifQ==

stringsAsFactors

Do you remember back in the data frame chapter when you used str() on your cash data frame? This was the output:

str(cash)

'data.frame':    3 obs. of  3 variables:
 $ company  : Factor w/ 2 levels "A","B": 1 1 2
 $ cash_flow: num  100 200 300
 $ year     : num  1 3 2

See how the company column has been converted to a factor? R's default behavior when creating data frames is to convert all characters into factors. This has caused countless novice R users a headache trying to figure out why their character columns are not working properly, but not you! You will be prepared!

To turn off this behavior:

cash <- data.frame(company, cash_flow, year, stringsAsFactors = FALSE)

str(cash)

'data.frame':    3 obs. of  3 variables:
 $ company  : chr  "A" "A" "B"
 $ cash_flow: num  100 200 300
 $ year     : num  1 3 2

Instructions

  • Two variables, credit_rating and bond_owners have been defined for you. bond_owners is a character vector of the names of some of your friends.
  • Create a data frame named bonds from credit_rating and bond_owners, in that order, and use stringsAsFactors = FALSE.
  • Use str() to confirm that both columns are characters.
  • bond_owners would not be a useful factor, but credit_rating could be! Create a new column in bonds called credit_factor using $ which is created from credit_rating as a correctly ordered factor.
  • Use str() again to confirm that credit_factor is an ordered factor.

eyJsYW5ndWFnZSI6InIgIiwicHJlX2V4ZXJjaXNlX2NvZGUiOiJcbiIsInNh bXBsZSI6IlxuIyBWYXJpYWJsZXNcbmNyZWRpdF9yYXRpbmcgPC0gYyhcIkFB QVwiLCBcIkFcIiwgXCJCQlwiKVxuYm9uZF9vd25lcnMgPC0gYyhcIkRhblwi LCBcIlRvbVwiLCBcIkpvZVwiKVxuXG4jIENyZWF0ZSB0aGUgZGF0YSBmcmFt ZSBvZiBjaGFyYWN0ZXIgdmVjdG9ycywgYm9uZHNcbmJvbmRzIDwtXG5cbiMg VXNlIHN0cigpIG9uIGJvbmRzXG5cblxuIyBDcmVhdGUgYSBmYWN0b3IgY29s dW1uIGluIGJvbmRzIGNhbGxlZCBjcmVkaXRfZmFjdG9yIGZyb20gY3JlZGl0 X3JhdGluZ1xuX19fJF9fXyA8LSBmYWN0b3IoX19fJF9fXywgb3JkZXJlZCA9 IF9fXywgbGV2ZWxzID0gYyhfX18pKVxuXG4jIFVzZSBzdHIoKSBvbiBib25k cyBhZ2FpbiIsInNvbHV0aW9uIjoiXG4jIFZhcmlhYmxlc1xuY3JlZGl0X3Jh dGluZyA8LSBjKFwiQUFBXCIsIFwiQVwiLCBcIkJCXCIpXG5ib25kX293bmVy cyA8LSBjKFwiRGFuXCIsIFwiVG9tXCIsIFwiSm9lXCIpXG5cbiMgQ3JlYXRl IHRoZSBkYXRhIGZyYW1lIG9mIGNoYXJhY3RlciB2ZWN0b3JzLCBib25kc1xu Ym9uZHMgPC0gZGF0YS5mcmFtZShjcmVkaXRfcmF0aW5nLCBib25kX293bmVy cywgc3RyaW5nc0FzRmFjdG9ycyA9IEZBTFNFKVxuXG4jIFVzZSBzdHIoKSBv biBib25kc1xuc3RyKGJvbmRzKVxuXG4jIENyZWF0ZSBhIGZhY3RvciBjb2x1 bW4gaW4gYm9uZHMgY2FsbGVkIGNyZWRpdF9mYWN0b3IgZnJvbSBjcmVkaXRf cmF0aW5nXG5ib25kcyRjcmVkaXRfZmFjdG9yIDwtIGZhY3Rvcihib25kcyRj cmVkaXRfcmF0aW5nLCBvcmRlcmVkID0gVFJVRSwgbGV2ZWxzID0gYyhcIkFB QVwiLFwiQVwiLFwiQkJcIikpXG5cbiMgVXNlIHN0cigpIG9uIGJvbmRzIGFn YWluXG5zdHIoYm9uZHMpIiwic2N0IjoiXG50ZXN0X3ByZWRlZmluZWRfb2Jq ZWN0cyhjKFwiY3JlZGl0X3JhdGluZ1wiLCBcImJvbmRfb3duZXJzXCIpKVxu XG50ZXN0X2Z1bmN0aW9uKFwiZGF0YS5mcmFtZVwiLCBhcmdzID0gYyhcIi4u LlwiLCBcInN0cmluZ3NBc0ZhY3RvcnNcIiksXG4gICAgICAgICAgICAgIGlu Y29ycmVjdF9tc2cgPSBcIkRpZCB5b3UgYWRkIGBjcmVkaXRfcmF0aW5nYCwg dGhlbiBgYm9uZF9vd25lcnNgIHRvIGBkYXRhLmZyYW1lKClgPyBEaWQgeW91 IHR1cm4gb2ZmIGBzdHJpbmdzQXNGYWN0b3JzYD9cIilcblxudGVzdF9mdW5j dGlvbihcInN0clwiLCBhcmdzID0gXCJvYmplY3RcIiwgaW5kZXggPSAxLCBl dmFsID0gRkFMU0UpXG5cbnRlc3RfZnVuY3Rpb24oXCJmYWN0b3JcIiwgYXJn cyA9IGMoXCJ4XCIsIFwib3JkZXJlZFwiLCBcImxldmVsc1wiKSxcbiAgICAg ICAgICAgICAgaW5jb3JyZWN0X21zZyA9IFwiRGlkIHlvdSB1c2UgZmFjdG9y IG9uIHRoZSBgY3JlZGl0X3JhdGluZ2AgY29sdW1uIG9mIGBib25kc2A/IERp ZCB5b3Ugb3JkZXIgaXQgY29ycmVjdGx5IGZyb20gbGVhc3QgbGlrZWx5IHRv IGRlZmF1bHQgdG8gbW9zdCBsaWtlbHk/XCIpXG5cbnRlc3Rfb2JqZWN0KFwi Ym9uZHNcIixcbiAgICAgICAgICAgIGluY29ycmVjdF9tc2cgPSBcIkRpZCB5 b3UgYWRkIGBjcmVkaXRfcmF0aW5nYCwgdGhlbiBgYm9uZF9vd25lcnNgIHRv IGBkYXRhLmZyYW1lKClgPyBEaWQgeW91IHR1cm4gb2ZmIGBzdHJpbmdzQXNG YWN0b3JzYD9cIilcblxudGVzdF9mdW5jdGlvbihcInN0clwiLCBhcmdzID0g XCJvYmplY3RcIiwgaW5kZXggPSAyKVxuXG5zdWNjZXNzX21zZyhcIiFbXSho dHRwOi8vczMuYW1hem9uYXdzLmNvbS9hc3NldHMuZGF0YWNhbXAuY29tL3By b2R1Y3Rpb24vY291cnNlXzI2NTMvZGF0YXNldHMvd2lubmluZ19kYW5jaW5n LmdpZilcIikiLCJoaW50IjoiXG5SZW1lbWJlciB0aGF0IHlvdSBjYW4gY3Jl YXRlIGEgbmV3IGBjcmVkaXRfZmFjdG9yYCBjb2x1bW4gdXNpbmcgYGJvbmRz JGNyZWRpdF9mYWN0b3JgLlxuXG5PcmRlcmVkIGZhY3RvcnMgd2VyZSBjcmVh dGVkIGVhcmxpZXIgaW4gdGhlIGNoYXB0ZXIgdXNpbmcgYGBmYWN0b3IoX19f LCBvcmRlcmVkID0gX19fLCBsZXZlbHMgPSBjKF9fXykpYGAuIn0=


If you want to learn more from this course, here is the link.

Want to leave a comment?