If you want to take our free Intro to R course, here is the link.
Accessing and subsetting data frames (1)
Even more often than with vectors, you are going to want to subset your data frame or access certain columns. Again, one of the ways to do this is to use
[ ]. The notation is just like matrices! Here are some examples:
Select the first row:
Select the first column:
Select the first column by name:
- Select the third row and second column of cash.
- Select the fifth row of the "year" column of cash.
If that makes sense keep going to the next exercise! If not, here is an overview video.
Overview Video on Data Frames
Accessing and subsetting data frames (2)
As you might imagine, selecting a specific column from a data frame is a common manipulation. So common, in fact, that it was given its own shortcut, the
$. The following return the same answer:
cash$cash_flow  1000 4000 550 1500 1100 750 6000 cash[,"cash_flow"]  1000 4000 550 1500 1100 750 6000
Useful right? Try it out!
- Select the
- Select the
$and multiply it by 2.
- You can delete a column by assigning it
NULL. Run the code that deletes
- Now print out
Accessing and subsetting data frames (3)
Often, just simply selecting a column from a data frame is not all you want to do. What if you are only interested in the cash flows from company A? For more flexibility, try subset()!
subset(cash, company == "A") company cash_flow year 1 A 1000 1 2 A 4000 3 3 A 550 4
There are a few important things happening here:
- The first argument you pass to
subset()is the name of your data frame,
- Notice that you shouldn't put
==is the equality operator. It tests to find where two things are equal, and returns a logical vector. There is a lot more to learn about these relational operators, and you can learn all about them in the second finance course, Intermediate R for Finance!
subset()to select only the rows of
cashcorresponding to company B
subset()rows that have cash flows due in 1 year.
Adding new columns
In a perfect world, you could be 100% certain that you will receive all of your cash flows. But, since these are predictions about the future, there is always a chance that someone won't be able to pay! You decide to run some analysis about a worst case scenario where you only receive half of your expected cash flow. To save the worst case scenario for later analysis, you decide to add it as a new column to the data frame!
cash$half_cash <- cash$cash_flow * .5 cash company cash_flow year half_cash 1 A 1000 1 500 2 A 4000 3 2000 3 A 550 4 275 4 B 1500 1 750 5 B 1100 2 550 6 B 750 4 375 7 B 6000 5 3000
And that's it! Creating new columns in your data frame is as simple as assigning the new information to
data_frame$new_column. Often, the newly created column is some transformation of existing columns, so the
$ operator really comes in handy here!
- Create a new worst case scenario where you only receive 25% of your expected cash flow, add it to the data frame as
- What if it took twice as long (in terms of
year) to receive your money? Add a new column
double_yearwith this scenario.
If you want to learn more from this course, here is the link.
Learn more about R