Tutorials
r programming

Data Frames in R

This tutorial takes course material from DataCamp's Introduction to R for Finance course and allows you to practice Data Frames.

If you want to take our free Intro to R course, here is the link.

Accessing and subsetting data frames (1)

Even more often than with vectors, you are going to want to subset your data frame or access certain columns. Again, one of the ways to do this is to use [ ]. The notation is just like matrices! Here are some examples:

Select the first row: cash[1, ]

Select the first column: cash[ ,1]

Select the first column by name: cash[ ,"company"]

Instructions

  • Select the third row and second column of cash.
  • Select the fifth row of the "year" column of cash.
eyJsYW5ndWFnZSI6InJcbiIsInByZV9leGVyY2lzZV9jb2RlIjoiXG5jYXNo IDwtIGRhdGEuZnJhbWUoY29tcGFueSAgID0gYyhcIkFcIiwgXCJBXCIsIFwi QVwiLCBcIkJcIiwgXCJCXCIsIFwiQlwiLCBcIkJcIiksXG4gICAgICAgICAg ICAgICAgICAgICAgICAgICAgY2FzaF9mbG93ID0gYygxMDAwLCA0MDAwLCA1 NTAsIDE1MDAsIDExMDAsIDc1MCwgNjAwMCksXG4gICAgICAgICAgICAgICAg ICAgICAgICAgICAgeWVhciAgICAgID0gYygxLCAzLCA0LCAxLCAyLCA0LCA1 KSkiLCJzYW1wbGUiOiJcbiMgVGhpcmQgcm93LCBzZWNvbmQgY29sdW1uXG5c blxuIyBGaWZ0aCByb3cgb2YgdGhlIFwieWVhclwiIGNvbHVtbiIsInNvbHV0 aW9uIjoiXG4jIFRoaXJkIHJvdywgc2Vjb25kIGNvbHVtblxuY2FzaFszLCAy XVxuXG4jIEZpZnRoIHJvdyBvZiB0aGUgXCJ5ZWFyXCIgY29sdW1uXG5jYXNo WzUsIFwieWVhclwiXSIsInNjdCI6IlxudGVzdF9vdXRwdXRfY29udGFpbnMo XCJjYXNoWzMsIDJdXCIsXG4gICAgICAgICAgICAgICAgICAgICAgICAgICAg ICBpbmNvcnJlY3RfbXNnID0gXCJEaWQgeW91IHNlbGVjdCB0aGUgM3JkIHJv dyBhbmQgMm5kIGNvbHVtbj9cIilcblxudGVzdF9vdXRwdXRfY29udGFpbnMo J2Nhc2hbNSwgXCJ5ZWFyXCJdJyxcbiAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgIGluY29ycmVjdF9tc2cgPSAnRGlkIHlvdSBzZWxlY3QgdGhlIDV0 aCByb3cgb2YgdGhlIGBcInllYXJcImAgY29sdW1uPycpXG5cbnN1Y2Nlc3Nf bXNnKFwiR3JlYXQgam9iISBTdWJzZXR0aW5nIGRhdGEgZnJhbWVzIGlzIGEg Z3JlYXQgc2tpbGwgdG8gbGVhcm4hXCIpIiwiaGludCI6IlxuUmVtZW1iZXIg eW91IGNhbiBzZWxlY3QgYm90aCByb3dzIGFuZCBjb2x1bW5zIHVzaW5nIGBk YXRhW3JvdywgY29sXWAuIn0=

If that makes sense keep going to the next exercise! If not, here is an overview video.

Overview Video on Data Frames

Accessing and subsetting data frames (2)

As you might imagine, selecting a specific column from a data frame is a common manipulation. So common, in fact, that it was given its own shortcut, the $. The following return the same answer:

cash$cash_flow

[1] 1000 4000  550 1500 1100  750 6000

cash[,"cash_flow"]

[1] 1000 4000  550 1500 1100  750 6000

Useful right? Try it out!

Instructions

  • Select the "year" column from cash using $.
  • Select the "cash_flow" column from cash using $ and multiply it by 2.
  • You can delete a column by assigning it NULL. Run the code that deletes "company".
  • Now print out cash again.
eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IlxuY2FzaCA8 LSBkYXRhLmZyYW1lKGNvbXBhbnkgICA9IGMoXCJBXCIsIFwiQVwiLCBcIkFc IiwgXCJCXCIsIFwiQlwiLCBcIkJcIiwgXCJCXCIpLFxuICAgICAgICAgICAg ICAgICAgIGNhc2hfZmxvdyA9IGMoMTAwMCwgNDAwMCwgNTUwLCAxNTAwLCAx MTAwLCA3NTAsIDYwMDApLFxuICAgICAgICAgICAgICAgICAgIHllYXIgICAg ICA9IGMoMSwgMywgNCwgMSwgMiwgNCwgNSkpIiwic2FtcGxlIjoiXG4jIFNl bGVjdCB0aGUgeWVhciBjb2x1bW5cblxuXG4gICAgICAgICAjIFNlbGVjdCB0 aGUgY2FzaF9mbG93IGNvbHVtbiBhbmQgbXVsdGlwbHkgYnkgMlxuXG5cbiAg ICAgICAgICMgRGVsZXRlIHRoZSBjb21wYW55IGNvbHVtblxuICAgICAgICAg Y2FzaCRjb21wYW55IDwtIE5VTExcblxuICAgICAgICAgIyBQcmludCBjYXNo IGFnYWluIiwic29sdXRpb24iOiJcbiMgU2VsZWN0IHRoZSB5ZWFyIGNvbHVt blxuICAgICAgICAgY2FzaCR5ZWFyXG5cbiAgICAgICAgICMgU2VsZWN0IHRo ZSBjYXNoX2Zsb3cgY29sdW1uIGFuZCBtdWx0aXBseSBieSAyXG4gICAgICAg ICBjYXNoJGNhc2hfZmxvdyAqIDJcblxuICAgICAgICAgIyBEZWxldGUgdGhl IGNvbXBhbnkgY29sdW1uXG4gICAgICAgICBjYXNoJGNvbXBhbnkgPC0gTlVM TFxuXG4gICAgICAgICAjIFByaW50IGNhc2ggYWdhaW5cbiAgICAgICAgIGNh c2giLCJzY3QiOiJcbnRlc3Rfb3V0cHV0X2NvbnRhaW5zKFwiY2FzaCR5ZWFy XCIsXG4gICAgICAgICAgICAgICAgICAgICAgICAgICAgICBpbmNvcnJlY3Rf bXNnID0gXCJEaWQgeW91IHNlbGVjdCB0aGUgYHllYXJgIGNvbHVtbj8gRGlk IHlvdSB1c2UgYSBgJGA/XCIpXG5cbiAgICAgICAgIHRlc3Rfb3V0cHV0X2Nv bnRhaW5zKFwiY2FzaCRjYXNoX2Zsb3cgKiAyXCIsXG4gICAgICAgICAgICAg ICAgICAgICAgICAgICAgICBpbmNvcnJlY3RfbXNnID0gXCJEaWQgeW91IHNl bGVjdCB0aGUgYGNhc2hfZmxvd2AgY29sdW1uPyBEaWQgeW91IG11bHRpcGx5 IGl0IGJ5IDI/XCIpXG5cbiAgICAgICAgICB0ZXN0X29iamVjdChcImNhc2hc IixcbiAgICAgICAgICAgICAgICAgICAgICBpbmNvcnJlY3RfbXNnID0gXCJE b24ndCBhbHRlciB0aGUgbGluZSByZW1vdmluZyB0aGUgYGNvbXBhbnlgIGNv bHVtbi5cIilcblxuICAgICAgICAgIHRlc3Rfb3V0cHV0X2NvbnRhaW5zKFwi Y2FzaFwiKVxuXG5cbiAgICAgICAgICBzdWNjZXNzX21zZyhcIlRoZSBgJGAg aXMgYSBncmVhdCBzaG9ydGN1dCB0byB1c2Ugd2l0aCBkYXRhIGZyYW1lcyEg TGVhcm4gdG8gbG92ZSBpdCEgYDEwYCBhbmQgYDFgIGZyb20gdGhlIHN0cmlu ZyFcIikiLCJoaW50IjoiXG5UbyBzdWJzZXQgdXNpbmcgdGhlIGBgJGBgLCB0 eXBlIGBkYXRhX2ZyYW1lJGNvbHVtbmAuIn0=

Accessing and subsetting data frames (3)

Often, just simply selecting a column from a data frame is not all you want to do. What if you are only interested in the cash flows from company A? For more flexibility, try subset()!

subset(cash, company == "A")

  company cash_flow year
1       A      1000    1
2       A      4000    3
3       A       550    4

There are a few important things happening here:

  • The first argument you pass to subset() is the name of your data frame, cash.
  • Notice that you shouldn't put company in quotes!
  • The == is the equality operator. It tests to find where two things are equal, and returns a logical vector. There is a lot more to learn about these relational operators, and you can learn all about them in the second finance course, Intermediate R for Finance!

Instructions

  • Use subset() to select only the rows of cash corresponding to company B
  • Now subset() rows that have cash flows due in 1 year.
eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IlxuY2FzaCA8 LSBkYXRhLmZyYW1lKGNvbXBhbnkgICA9IGMoXCJBXCIsIFwiQVwiLCBcIkFc IiwgXCJCXCIsIFwiQlwiLCBcIkJcIiwgXCJCXCIpLFxuICAgICAgICAgICAg ICAgICAgICAgICAgICAgICBjYXNoX2Zsb3cgPSBjKDEwMDAsIDQwMDAsIDU1 MCwgMTUwMCwgMTEwMCwgNzUwLCA2MDAwKSxcbiAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgeWVhciAgICAgID0gYygxLCAzLCA0LCAxLCAyLCA0LCA1 KSkiLCJzYW1wbGUiOiJcbiMgUm93cyBhYm91dCBjb21wYW55IEJcblxuXG4g ICAgICAgICAgIyBSb3dzIHdpdGggY2FzaCBmbG93cyBkdWUgaW4gMSB5ZWFy Iiwic29sdXRpb24iOiJcbiMgUm93cyBhYm91dCBjb21wYW55IEJcbiAgICAg ICAgICBzdWJzZXQoY2FzaCwgY29tcGFueSA9PSBcIkJcIilcblxuICAgICAg ICAgICMgUm93cyB3aXRoIGNhc2ggZmxvd3MgZHVlIGluIDEgeWVhclxuICAg ICAgICAgIHN1YnNldChjYXNoLCB5ZWFyID09IDEpIiwic2N0IjoiXG50ZXN0 X291dHB1dF9jb250YWlucygnc3Vic2V0KGNhc2gsIGNvbXBhbnkgPT0gXCJC XCIpJyxcbiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBpbmNvcnJl Y3RfbXNnID0gXCJEaWQgeW91IGZvbGxvdyB0aGUgZXhhbXBsZSwgYnV0IHRo aXMgdGltZSBjaG9zZSBjb21wYW55IEI/XCIpXG5cbiAgICAgICAgICB0ZXN0 X291dHB1dF9jb250YWlucygnc3Vic2V0KGNhc2gsIHllYXIgPT0gMSknLFxu ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGluY29ycmVjdF9tc2cg PSBcIkRvbid0IHB1dCB0aGUgYDFgIGluIHF1b3RlcyEgT25seSB1c2UgcXVv dGVzIGZvciBjaGFyYWN0ZXJzLiBPdGhlcndpc2UsIGZvbGxvdyB0aGUgZXhh bXBsZSBnaXZlbiBidXQgc3Vic2V0IHVzaW5nIHRoZSBgeWVhcmAgY29sdW1u LlwiKVxuXG4gICAgICAgICAgc3VjY2Vzc19tc2coXCJHcmVhdCEgYHN1YnNl dCgpYCBhbGxvd3MgeW91IHRvIGNyZWF0ZSBtb3JlIHBvd2VyZnVsIHdheXMg dG8gc2VsZWN0IGdyb3VwcyBmcm9tIHlvdXIgZGF0YS5cIikiLCJoaW50Ijoi XG5DYXNoIGZsb3dzIGR1ZSBpbiAxIHllYXIgY29ycmVzcG9uZCB0byBgeWVh ciA9PSAxYC4ifQ==

Adding new columns

In a perfect world, you could be 100% certain that you will receive all of your cash flows. But, since these are predictions about the future, there is always a chance that someone won't be able to pay! You decide to run some analysis about a worst case scenario where you only receive half of your expected cash flow. To save the worst case scenario for later analysis, you decide to add it as a new column to the data frame!

cash$half_cash <- cash$cash_flow * .5

cash

  company cash_flow year half_cash
1       A      1000    1       500
2       A      4000    3      2000
3       A       550    4       275
4       B      1500    1       750
5       B      1100    2       550
6       B       750    4       375
7       B      6000    5      3000

And that's it! Creating new columns in your data frame is as simple as assigning the new information to data_frame$new_column. Often, the newly created column is some transformation of existing columns, so the $ operator really comes in handy here!

Instructions

  • Create a new worst case scenario where you only receive 25% of your expected cash flow, add it to the data frame as quarter_cash.
  • What if it took twice as long (in terms of year) to receive your money? Add a new column double_year with this scenario.
eyJsYW5ndWFnZSI6InIgIiwicHJlX2V4ZXJjaXNlX2NvZGUiOiJcbmNhc2gg PC0gZGF0YS5mcmFtZShjb21wYW55ICAgPSBjKFwiQVwiLCBcIkFcIiwgXCJB XCIsIFwiQlwiLCBcIkJcIiwgXCJCXCIsIFwiQlwiKSxcbiAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgY2FzaF9mbG93ID0gYygxMDAwLCA0MDAwLCA1 NTAsIDE1MDAsIDExMDAsIDc1MCwgNjAwMCksXG4gICAgICAgICAgICAgICAg ICAgICAgICAgICAgIHllYXIgICAgICA9IGMoMSwgMywgNCwgMSwgMiwgNCwg NSkpIiwic2FtcGxlIjoiXG4jIFF1YXJ0ZXIgY2FzaCBmbG93IHNjZW5hcmlv XG4gICAgICAgICAgY2FzaCRxdWFydGVyX2Nhc2ggPC1cblxuICAgICAgICAg ICMgRG91YmxlIHllYXIgc2NlbmFyaW8iLCJzb2x1dGlvbiI6IlxuIyBRdWFy dGVyIGNhc2ggZmxvdyBzY2VuYXJpb1xuICAgICAgICAgIGNhc2gkcXVhcnRl cl9jYXNoIDwtIGNhc2gkY2FzaF9mbG93ICogLjI1XG5cbiAgICAgICAgICAj IERvdWJsZSB5ZWFyIHNjZW5hcmlvXG4gICAgICAgICAgY2FzaCRkb3VibGVf eWVhciA8LSBjYXNoJHllYXIgKiAyIiwic2N0IjoiXG50ZXN0X2RhdGFfZnJh bWUoXCJjYXNoXCIsIGNvbHVtbnMgPSBcInF1YXJ0ZXJfY2FzaFwiLFxuICAg ICAgICAgICAgICAgICAgICAgICAgICB1bmRlZmluZWRfbXNnID0gXCJNYWtl IHN1cmUgdG8gbm90IHJlbW92ZSBgY2FzaGAhXCIsXG4gICAgICAgICAgICAg ICAgICAgICAgICAgIHVuZGVmaW5lZF9jb2xzX21zZyA9IFwiSGF2ZSB5b3Ug YWRkZWQgdGhlIGNvbHVtbiBgcXVhcnRlcl9jYXNoYCB0byBgY2FzaGA/XCIs XG4gICAgICAgICAgICAgICAgICAgICAgICAgIGluY29ycmVjdF9tc2cgPSBc IkhhdmUgeW91IGNvcnJlY3RseSBjYWxjdWxhdGVkIHRoZSBjb2x1bW4gYHF1 YXJ0ZXJfY2FzaGAgYnkgbXVsdGlwbHlpbmcgYGNhc2hfZmxvd2AgYnkgYC4y NWA/XCIpXG5cbiAgICAgICAgICB0ZXN0X2RhdGFfZnJhbWUoXCJjYXNoXCIs IGNvbHVtbnMgPSBcImRvdWJsZV95ZWFyXCIsXG4gICAgICAgICAgICAgICAg ICAgICAgICAgIHVuZGVmaW5lZF9tc2cgPSBcIk1ha2Ugc3VyZSB0byBub3Qg cmVtb3ZlIGBjYXNoYCFcIixcbiAgICAgICAgICAgICAgICAgICAgICAgICAg dW5kZWZpbmVkX2NvbHNfbXNnID0gXCJIYXZlIHlvdSBhZGRlZCB0aGUgY29s dW1uIGBkb3VibGVfeWVhcmAgdG8gYGNhc2hgP1wiLFxuICAgICAgICAgICAg ICAgICAgICAgICAgICBpbmNvcnJlY3RfbXNnID0gXCJIYXZlIHlvdSBjb3Jy ZWN0bHkgY2FsY3VsYXRlZCB0aGUgY29sdW1uIGBkb3VibGVfeWVhcmAgYnkg bXVsdGlwbHlpbmcgYHllYXJgIGJ5IGAyYD9cIilcblxuICAgICAgICAgIHN1 Y2Nlc3NfbXNnKFwiR3JlYXQhIFNlZSBob3cgdXNlZnVsIHRoZSBgJGAgaXMg Zm9yIHJlYWRhYmlsaXR5P1wiKSIsImhpbnQiOiJcbmBxdWFydGVyX2Nhc2hg IHNob3VsZCBsb29rIHZlcnkgc2ltaWxhciB0byB0aGUgZXhhbXBsZSBhYm92 ZSEgRm9yIGBkb3VibGVfeWVhcmAsIHRoaW5rIGFib3V0IHdoYXQgeW91IG5l ZWQgdG8gZG8gdG8gdGhlIGB5ZWFyYCBjb2x1bW4gdG8gZG91YmxlIHRoZSBh bW91bnQgb2YgdGltZSB1bnRpbCB5b3UgcmVjZWl2ZSB0aGUgbW9uZXkuIn0=


If you want to learn more from this course, here is the link.

Want to leave a comment?