Subsetting in R Tutorial

Find out how to access your dataframe's data with subsetting. Learn how to subset by using brackets or by using R's subset() function.

Updated Dec 2, 2024 · 4 min read

Subsetting in R is a useful indexing feature for accessing object elements. It can be used to select and filter variables and observations. The two primary methods for subsetting data in R are brackets [], which are a general indexing method, and the subset() function, which is a higher-level and more user-friendly method.

If you want to explore more about data subsetting and other R programming techniques, start with our Introduction to R course today. You will be surprised at how fast you pick it up. If you have more experience, our Intermediate R is another great option.

Selecting Rows

Here is an example of subsetting on a dataframe called debt.

debt[3:6, ]

      name  payment
3      Dan      150
4      Rob       50
5      Rob       75
6      Rob      100

Here, we selected rows 3 through 6 of debt.

Your Path to Mastering R

Start from scratch and build core R skills for data science.

Start Learning for Free

Selecting rows from a specific column

Another thing to study is the simplification that happens when you select a single column. Selecting the first three rows of just the payment column simplifies the result into a vector.

debt[1:3, 2]

100 200 150

Dataframe formatting

To keep it as a dataframe, just add drop=False as shown below:

debt[1:3, 2, drop = FALSE]

    payment
1       100
2       200
3       150

Selecting a specific column

To select a specific column, you can also type in the name of the dataframe, followed by a $, and then the name of the column you are looking to select. In this example, we will be selecting the payment column of the dataframe. When running this script, R will simplify the result as a vector.

debt$payment

100 200 150 50 75 100

Using the subset() function

When looking to create more complex subsets or a subset based on a condition, the next step up is to use the subset() function. For example, what if you wanted to look at debt from someone named Dan. You could just use the brackets to select their debt and total it up, but it isn't a very robust way of doing things, especially with potential changes to the data set.

# This works, but is not informative
debt[1:3, ]

subset() on a categorical variable

A better way to do this is to use the subset() function to select the rows where the name column is equal to Dan. Notice that their needs to be a double equals sign, known as a relational operator.

# This works, but is not informative nor robust
debt[1:3, ]

# Much more informative!
subset(debt, name == "Dan")

      name     payment
1      Dan         100
2      Dan         200
3      Dan         150

subset() on a numeric variable

We can also subset on numeric columns. If we wanted to see rows where payments equal $100, you would do the following:

subset(debt, payment == 100)

      name  payment
1      Dan      100
6      Rob      100

Accessing and Subsetting Dataframes

Moving to this next example, what if you are only interested in the cash flows from company A?

subset(cash, company == "A")

      company  cash_flow  year
1           A       1000     1
2           A       4000     3
3           A        550     4

Remember:

The first argument you pass to subset() is the name of your dataframe, cash.
Notice that you shouldn't put company in quotes!
The == is the equality operator. It tests to find where two things are equal and returns a logical vector.

Interactive Example of the subset() Method

In the below example, you will use the subset() method to select only the rows of cash corresponding to company B. And then, subset() rows that have cash flows due in 1 year.

# Rows about company B
subset(cash, company == "B")

# Rows with cash flows due in 1 year
subset(cash, year == 1)

When you run the above code, it produces the following result:

  company cash_flow year
4       B      1500    1
5       B      1100    2
6       B       750    4
7       B      6000    5

  company cash_flow year
1       A      1000    1
4       B      1500    1

Try it for yourself.

To learn more about accessing and subsetting dataframes in R, please see this video from our course Introduction to R for Finance.

This content is taken from DataCamp’s Introduction to R for Finance course by Lore Dirick.

Final Thoughts on Subsetting

Part of the fun of R is that it offers different methods for performing similar tasks. Subsetting is no exception, with both the brackets [] and subset() function accomplishing the same thing. You can choose between the two, depending on whether you prefer low-level control or high-level simplicity.

Consider advancing your skills with our Machine Learning Scientist in R career track. You will deepen your understanding of R's core functionality, but also you will be equipped with advanced techniques to tackle machine learning problems.

Topics

Data Science

R Courses

Course

Introduction to R

4 hr

Master the basics of data analysis in R, including vectors, lists, and data frames, and practice R with real data sets.

See Details

Start Course

Course

Intermediate R

6 hr

657.4K

Continue your journey to becoming an R ninja by learning about conditional statements, loops, and vector functions.

See Details

Start Course

Course

Introduction to R for Finance

4 hr

83.8K

Learn essential data structures such as lists and data frames and apply that knowledge directly to financial examples.

See Details

Start Course

Tutorial

Subsetting Datasets in R

Subsetting datasets is a crucial skill for any data professional. Learn and practice subsetting data in this quick interactive tutorial!

Tom Jeon

Tutorial

Matrices in R Tutorial

Learn all about R's matrix, naming rows and columns, accessing elements also with computation like addition, subtraction, multiplication, and division.

Olivia Smith

Tutorial

Utilities in R Tutorial

Learn about several useful functions for data structure manipulation, nested-lists, regular expressions, and working with times and dates in the R programming language.