Subsetting in R
Subsetting in R is a useful indexing feature for accessing object elements. It can be used to select and filter variables and observations. You can use brackets to select rows and columns from your dataframe.
name payment 3 Dan 150 4 Rob 50 5 Rob 75 6 Rob 100
Here we selected rows 3 through 6 of debt. One thing to look at is the simplification that happens when you select a single column.
Selecting Rows From a Specific Column
Selecting the first three rows of just the payment column simplifies the result into a vector.
100 200 150
To keep it as a dataframe, just add
drop=False as shown below:
debt[1:3, 2, drop = FALSE]
payment 1 100 2 200 3 150
Selecting a Specific Column [Shortcut]
To select a specific column, you can also type in the name of the dataframe, followed by a
$, and then the name of the column you are looking to select. In this example, we will be selecting the
payment column of the dataframe. When running this script, R will simplify the result as a vector.
100 200 150 50 75 100
subset() for More Power
When looking to create more complex subsets or a subset based on a condition, the next step up is to use the subset() function. For example, what if you wanted to look at debt from someone named Dan. You could just use the brackets to select their debt and total it up, but it isn't a very robust way of doing things, especially with potential changes to the data set.
# This works, but is not informative nor robust debt[1:3, ]
A better way to do this is to use the
subset() function to select the rows where the name column is equal to Dan. Notice that their needs to be a double equals sign, known as a relational operator.
# This works, but is not informative nor robust debt[1:3, ] # Much more informative! subset(debt, name == "Dan")
name payment 1 Dan 100 2 Dan 200 3 Dan 150
subset() Function on a Numeric Column
We can also subset on numeric columns. If we wanted to see rows where payments equal $100, you would do the following:
subset(debt, payment == 100)
name payment 1 Dan 100 6 Rob 100
Accessing and Subsetting Dataframes
Moving to this next example, what if you are only interested in the cash flows from company A?
subset(cash, company == "A")
company cash_flow year 1 A 1000 1 2 A 4000 3 3 A 550 4
- The first argument you pass to
subset()is the name of your dataframe,
- Notice that you shouldn't put
==is the equality operator. It tests to find where two things are equal and returns a logical vector.
Interactive Example of the
In the below example, you will use the
subset() method to select only the rows of
cash corresponding to company B.
subset() rows that have cash flows due in 1 year.
# Rows about company B subset(cash, company == "B") # Rows with cash flows due in 1 year subset(cash, year == 1)
When you run the above code, it produces the following result:
company cash_flow year 4 B 1500 1 5 B 1100 2 6 B 750 4 7 B 6000 5
company cash_flow year 1 A 1000 1 4 B 1500 1
To learn more about accessing and subsetting dataframes in R, please see this video from our course Introduction to R for Finance.
This content is taken from DataCamp’s Introduction to R for Finance course by Lore Dirick.