R provides a different way to sort the data either in ascending or descending order; Data analysts and data scientists use order(), sort() and packages like dplyr to sort data depending upon the structure of the obtained data.
TL;DR
- Use
sort()to directly sort a vector and return the sorted values - Use
order()to get the indices that would sort a vector—ideal for sorting data frames - Use
dplyr::arrange()for a modern, readable approach to sorting data frames - Handle NA values with the
na.lastparameter (TRUE puts NAs at end, FALSE at beginning) - Sort in descending order with
decreasing = TRUEor by using a minus sign
order() can sort vector, matrix, and also a data frame can be sorted in ascending and descending order with its help, which is shown in the final section of this tutorial.
Syntax of order()
The sort() function
Before diving into order(), let's look at the simpler sort() function. While order() returns indices, sort() directly returns the sorted values:
# Create a vector
values <- c(4, 12, 6, 7, 2, 9, 5)
# sort() returns the sorted values directly
sort(values)
[1] 2 4 5 6 7 9 12
The key difference is that sort() gives you the actual sorted values, while order() gives you the positions. This makes order() more useful for sorting data frames, as we'll see later.
Syntax of order()
The syntax of order() is shown below:
order(x, decreasing = TRUE or FALSE, na.last = TRUE or FALSE, method = c("auto", "shell", "quick", "radix"))
The argument above in order() states that:
- x: data-frames, matrices, or vectors
- decreasing: boolean value; TRUE then sort in descending order or FALSE then sort in ascending order.
- na.last: boolean value;TRUE then NA indices are put at last or FALSE then NA indices are put first.
- method: sorting method to be used.
order() in R
Let's look at an example of order() in action.
The code below creates a vector y containing a list of numbers. I'll use order(y) to get the indices that would sort these numbers.
y <- c(4, 12, 6, 7, 2, 9, 5)
order(y)
The above code gives the following output:
5 1 7 3 4 6 2
Here the order() will sort the given numbers according to its index in the ascending order. Since number 2 is the smallest, which has an index as five and number 4 is index 1, and similarly, the process moves forward in the same pattern.
y <- c(4, 12, 6, 7, 2, 9, 5)
y[order(y)]
The above code gives the following output:
2 4 5 6 7 9 12
Here the indexing of order is done where the actual values are printed in the ascending order. The values are ordered according to the index using order() then after each value accessed using y[some-value].
Sorting vector using different parameters in order()
Let's look at an example where the datasets contain the value as symbol NA(Not available).
order(x,na.last=TRUE)
x <- c(8, 2, 4, 1, -4, NA, 46, 8, 9, 5, 3)
order(x, na.last = TRUE)
The above code gives the following output:
5 4 2 11 3 10 1 8 9 7 6
Here the order() will also sort the given list of numbers according to its index in the ascending order. Since NA is present, its index will be placed last, where 6 will be placed last because of na.last=TRUE.
order(x,na.last=FALSE)
order(x, na.last = FALSE)
The above code gives the following output:
6 5 4 2 11 3 10 1 8 9 7
Here the order() will also sort the given list of numbers according to its index in the ascending order.Since NA is present, its index, which is 6, will be placed first because of na.last=FALSE.
order(x,decreasing=TRUE,na.last=TRUE)
order(x, decreasing = TRUE, na.last = TRUE)
The above code gives the following output:
7 9 1 8 10 3 11 2 4 5 6
Here order() will sort a given list of numbers according to its index in the descending order because of decreasing=TRUE: 46. The largest is placed at index 7, and the other values are arranged in a decreasing manner. Since NA is present, index 6 will be placed last because of na.last=TRUE.
order(x,decreasing=FALSE,na.last=FALSE)
order(x, decreasing = FALSE, na.last = FALSE)
The above code gives the following output:
6 5 4 2 11 3 10 1 8 9 7
Here NA is present which index is 6 will be placed at first because of na.last=FALSE. order() will sort a given list of numbers according to its index in the ascending order because of decreasing=FALSE: -4, which is smallest placed at index 5, and the other values are arranged increasingly.
Start Learning R For Free
Intermediate R
Sorting a dataframe by using order()
Let's create a data frame where the population value is 10. The variable gender consists of vector values 'male' and 'female' where 10 sample values could be obtained with the help of sample(), whereas replace = TRUE will generate only the unique values. Similarly, the age consists of value from 25 to 75, along with a degree of possible value as c("MA," "ME," "BE," "BSCS"), which again will generate unique values.
Task: To sort the given data in the ascending order based on the given population's age.
Note: The sample data shown may differ while you're trying to use it in your local machine because each time running a code will create a unique dataframe.
population <- 10
gender <- sample(c("male", "female"), population, replace = TRUE)
age <- sample(25:75, population, replace = TRUE)
degree <- sample(c("MA", "ME", "BE", "BSCS"), population, replace = TRUE)
final_data <- data.frame(gender = gender, age = age, degree = degree)
final_data
| gender | age | degree |
|---|---|---|
| male | 40 | MA |
| female | 57 | BSCS |
| male | 66 | BE |
| female | 61 | BSCS |
| female | 48 | MA |
| male | 25 | MA |
| female | 49 | BE |
| male | 52 | ME |
| female | 57 | MA |
| female | 35 | MA |
The above code gives the following output, which shows a newly created dataframe.
gender age degree
male 40 MA
female 57 BSCS
male 66 BE
female 61 BSCS
female 48 MA
male 25 MA
female 49 BE
male 52 ME
female 57 MA
female 35 MA
Let's sort the dataframe in the ascending order by using order() based on the variable age.
order(final_data$age)
The above code gives the following output:
6 10 3 9 5 8 4 2 7 1
Since age 25 is at index 6 followed by age 35 at index 10 and similarly, all the age-related values are arranged in ascending order.
The code below contains the [] order with variable age, is used to arrange in ascending order where the gender, along with degree information is also printed.
final.data[order(final.data$age),]
| gender | age | degree | |
|---|---|---|---|
| 6 | male | 25 | MA |
| 10 | female | 35 | MA |
| 1 | male | 40 | MA |
| 5 | female | 48 | MA |
| 7 | female | 49 | BE |
| 8 | male | 52 | ME |
| 2 | female | 57 | BSCS |
| 9 | female | 57 | MA |
| 4 | female | 61 | BSCS |
| 3 | male | 66 | BE |
The above code gives the following output:
gender age degree
6 male 25 MA
10 female 35 MA
1 male 40 MA
5 female 48 MA
7 female 49 BE
8 male 52 ME
2 female 57 BSCS
9 female 57 MA
4 female 61 BSCS
3 male 66 BE
The output above shows that age is arranged in ascending order along with its corresponding gender and degree information is obtained.
Sorting With dplyr's arrange()
For a more readable and modern approach, you can use the arrange() function from the dplyr package. This is especially useful when working with the tidyverse ecosystem:
# Install dplyr if needed: install.packages("dplyr")
library(dplyr)
# Sort by age in ascending order
final_data %>% arrange(age)
# Sort by age in descending order
final_data %>% arrange(desc(age))
# Sort by multiple columns
final_data %>% arrange(gender, desc(age))
The arrange() function is often preferred for data analysis workflows because it integrates seamlessly with other dplyr functions like filter(), select(), and mutate() using the pipe operator.
Comparison of Sorting Methods
| Function | Returns | Best For | Package |
|---|---|---|---|
sort() |
Sorted values | Vectors | Base R |
order() |
Indices | Data frames, complex sorting | Base R |
arrange() |
Sorted data frame | Tidyverse workflows | dplyr |
setorder() |
Modifies in place | Large datasets (memory efficient) | data.table |
Conclusion
In this tutorial, I covered the essential methods for sorting data in R:
sort()for directly sorting vectorsorder()for getting indices to sort vectors and data frames- Handling NA values with the
na.lastparameter dplyr::arrange()for modern, readable data frame sorting- Multi-column sorting for complex ordering requirements
For more information, see the official R documentation for order().
If you would like to learn more about R, take DataCamp's Introduction to R course. You can also explore our data.table tutorial for high-performance data manipulation, or check out our guide on pipes in R for cleaner code.