Course
Finding the median is a fundamental part of data analysis, especially when you're dealing with skewed distributions or outliers.
In R, the median()
function offers a simple, built-in way to calculate a very important (non-parametric!) measure of central tendency. I’ll show you the ropes:
What Does R median() Do?
The median()
function examines your numeric data and returns (one interpretation of) the central value.
median(numeric_vector)
Here, numeric_vector
refers to a numeric vector or a similar object, such as a column within a data frame.
A Simple Example Using R median()
Let’s see this in action. Suppose you have a small vector of numbers:
sales_numbers <- c(3, 5, 8, 2, 7)
median(sales_numbers)
R automatically sorts the numbers (2, 3, 5, 7, 8)
, then returns 5
. This is the value sitting right in the center.
But what if your vector has an even number of elements, like in the next example?
quarterly_sales <- c(10, 4, 7, 2)
median(quarterly_sales)
Here, R sorts to (2, 4, 7, 10)
and then averages the two middle numbers (4
and 7
), yielding 5.5
.
If your dataset contains an odd number of values, it simply picks the one in the center. For an even number of values, median()
calculates the average of the two middle numbers.
So under the hood, median()
does the sorting for you, which is very convenient.
Handling NA Values with median()
As your datasets grow, it’s common to encounter missing values. These can trip up your calculations if you’re not careful. By default, median()
will return NA
if any missing values are present in your vector. (Any missing values at all.) Let’s see how this plays out:
monthly_sales <- c(6, 3, NA, 9)
median(monthly_sales)
Notice that the result is NA
. To sidestep this, add the na.rm = TRUE
argument. This tells R to remove missing values before calculating the median:
median(monthly_sales, na.rm = TRUE)
This time, you’ll get 6
(the median of the remaining values (3
, 6
, and 9
)). Keep an eye out for those NAs, as they can sneak into your data and throw off your analysis if you forget them.
Finding the Median by Group in R
Let's try something new: comparing medians across different groups.
A common question: What if we want to see the median income by region? Just like we did with simple vectors, R makes this easy, but now we’ll need to group our data first.
One quick way in base R is with tapply()
:
income_values <- c(40000, 50000, 45000, 35000, 60000, 30000)
region_labels <- c("East", "West", "West", "East", "East", "West")
tapply(income_values, region_labels, median)
This computes the median income for each region.
If you prefer working with the dplyr
package, you can achieve the same result using pipe-friendly syntax:
library(dplyr)
income_data <- data.frame(income = income_values, region = region_labels)
income_data %>%
group_by(region) %>%
summarise(median_income = median(income))
This approach produces a tidy summary table, listing the median income for each region.
Grouped summaries like this are especially useful when you want to compare central tendencies across categories in your data.
Calculating the Median for Data Frame Columns in R
Sometimes, you might want to get the median for every column in a data frame at once—maybe for a quick scan of your dataset’s central values. Here I will use sapply()
.
test_scores <- data.frame(math = c(1, 2, 3), science = c(7, 8, 9))
sapply(test_scores, median)
This command returns the median of each column, so you can spot trends or outliers at a glance. This technique is especially handy when you’re exploring new data and want a fast overview.
Things to Watch Out For
Before we wrap up, let’s highlight a few common pitfalls and best practices to ensure smooth sailing with median()
:
-
Check that you’re only passing numeric data to
median()
. If you try a factor or character vector, R will throw an error. -
Remember to use
na.rm = TRUE
if you suspect missing data. -
No need to sort your data manually.
median()
takes care of sorting internally, so just pass your vector as-is.
Conclusion
The median()
function is a great way to understand the true center of your data, especially when your dataset contains outliers that can distort the mean. The median()
function is also flexible enough to handle complexities like missing values or grouped summaries.
For next steps, I also wrote a short article on the R mean() function, if you want to take a look. And remember to take our Exploratory Data Analysis in R course to really build all the important useful job skills.

I'm a data science writer and editor with contributions to research articles in scientific journals. I'm especially interested in linear algebra, statistics, R, and the like. I also play a fair amount of chess!