Track
Standard deviation is one of the most common ways to summarize how spread out your data is. In R, the sd()
function gives you a quick way to calculate this measure of variability, whether you're working with vectors, data frames, or grouped data.
In this article, I’ll walk through the basics of using sd()
in R, explore how to handle missing values, and demonstrate how to compute standard deviations across groups using functions like tapply()
and packages like dplyr
.
What Does sd() Do in R?
In R, the sd()
function from base R calculates the standard deviation of a numeric vector or data frame column. It measures how much the values in your data deviate from the mean, giving you a sense of variability or dispersion.
If you're familiar with the idea of variance, the standard deviation is simply its square root and, because of how it’s calculated (taking the square root of the average squared deviation) standard deviation remains on the same scale as the original data. This makes it easier to interpret in practical terms.
A Simple sd() Example
Suppose you have a set of numbers and want to know how much they vary from the average.
Here’s how you’d do it in R:
exam_scores <- c(75, 80, 85, 90, 95)
sd(exam_scores)
The output tells you the standard deviation, which quantifies the average distance of each score from the mean.
You can use sd()
with any numeric vector, including integers, doubles, or the results of calculations. Now, you might know that R treats logical vectors as numeric (TRUE
as 1
, FALSE
as 0
), so the sd()
function would technically work on logical vectors (you can try it out if you want), but the idea of standard deviation is most meaningful for continuous numeric data.
Handling Missing Values with sd()
Real-world datasets often contain missing values, so this is worth mentioning: If your data includes any NA
values, sd()
will return NA
by default.
Consider this example:
heights <- c(170, 175, NA, 180, 185)
sd(heights)
The result is NA
.
To ignore missing values and calculate the standard deviation of the available numbers, use the na.rm = TRUE
argument. Or else, you can scrub the NA
values from your dataset, but I'd advise that only if it makes sense to do so.
sd(heights, na.rm = TRUE)
Now you’ll get the standard deviation for 170, 175, 180, and 185.
Calculating Standard Deviation for Data Frames in R
You can easily calculate the standard deviation of a specific column in a data frame using the $
operator. Suppose you have a data frame of product weights:
product_data <- data.frame( weight = c(1.2, 1.5, 1.3, 1.7, 1.4), price = c(10, 12, 11, 13, 12) )
sd(product_data$weight)
This gives you the standard deviation of the weight
column. Remember, sd()
is designed for numeric vectors, not entire data frames. Always select the appropriate column.
sd() and Grouped Standard Deviations
Often, you’ll want to measure variability within groups, such as by category, region, or by some other factor. R provides several ways to compute grouped standard deviations. I'll show three:
Using tapply() for grouped standard deviations
Suppose you have sales data for different regions and want the standard deviation for each region:
sales_amount <- c(200, 220, 210, 250, 240, 230)
region <- c("North", "North", "South", "South", "North", "South")
tapply(sales_amount, region, sd)
tapply()
applies sd()
to each group defined by region.
Using aggregate() for grouped standard deviations
If you prefer your results in a tidy data frame, aggregate()
works:
sales_data <- data.frame( region = c("North", "North", "South", "South", "North", "South"), amount = c(200, 220, 210, 250, 240, 230) )
aggregate(amount ~ region, data = sales_data, sd)
This produces a summary data frame with the standard deviation for each region.
Using dplyr for grouped standard deviations
With the dplyr
package, grouped calculations are even more readable. Use summarize()
after grouping:
library(dplyr)
sales_data %>%
group_by(region) %>%
summarize(sd_amount = sd(amount))
This approach is especially useful for larger datasets or when chaining multiple data transformations. (Or if you simply prefer using the pipe operator, like I do.)
Common Mistakes and Tips
Even a simple function like sd() can cause issues if you’re not careful. Here are some common pitfalls:
-
Non-numeric data:
sd()
only works with numeric vectors. If your data includes characters or factors, you’ll get an error. Useis.numeric()
to check your data. -
Missing values: A single
NA
will causesd()
to returnNA
. Always usena.rm = TRUE
if you want to ignore missing values. -
Small sample sizes:
sd()
uses n-1 in the denominator (sample standard deviation). For vectors of length 1,sd()
returnsNA
, not0
.
That last error in particular, I think, is interesting. It's a lesser-known difference, and it can trip you up if you are transferring from Python to R, since Python divides by the population standard deviation, not the sample.
Useful Variations and Related Functions in R
R offers several related functions for measuring variability and summarizing data:
-
var()
: Calculates the variance, which is the square of the standard deviation. -
apply()
,sapply()
,lapply()
: Useful for applyingsd()
across rows or columns of matrices and data frames. mad()
: Computes the median absolute deviation, which is another interesting and more robust measure of variability.
Let me show you sd()
used with apply()
, since I mentioned: Here’s how to use apply()
to get the standard deviation of each column in a matrix. Like before, I'll create that matrix on the fly.
measurement_matrix <- matrix(1:9, nrow = 3)
apply(measurement_matrix, 2, sd)
This returns the standard deviation for each column, giving you a quick overview of variability across multiple variables.
Conclusion
The sd()
function in R is relatively simple (it's just one line of code). But if you find yourself using it for reports or presentations and you feel a little foggy on the details, enroll in our Statistician in R career track. There's a lot of nuance in statistics and data analysis, so do make sure that you are well-versed.

I'm a data science writer and editor with contributions to research articles in scientific journals. I'm especially interested in linear algebra, statistics, R, and the like. I also play a fair amount of chess!