R sd() Function: Standard Deviation in R

Learn how to measure variability in your data using the R sd() function. Discover practical examples and essential techniques for handling missing values and grouped data.

Jun 23, 2025 · 5 min read

Standard deviation is one of the most common ways to summarize how spread out your data is. In R, the sd() function gives you a quick way to calculate this measure of variability, whether you're working with vectors, data frames, or grouped data.

In this article, I’ll walk through the basics of using sd() in R, explore how to handle missing values, and demonstrate how to compute standard deviations across groups using functions like tapply() and packages like dplyr.

What Does sd() Do in R?

In R, the sd() function from base R calculates the standard deviation of a numeric vector or data frame column. It measures how much the values in your data deviate from the mean, giving you a sense of variability or dispersion.

If you're familiar with the idea of variance, the standard deviation is simply its square root and, because of how it’s calculated (taking the square root of the average squared deviation) standard deviation remains on the same scale as the original data. This makes it easier to interpret in practical terms.

A Simple sd() Example

Suppose you have a set of numbers and want to know how much they vary from the average.

Here’s how you’d do it in R:

exam_scores <- c(75, 80, 85, 90, 95) 
sd(exam_scores)

The output tells you the standard deviation, which quantifies the average distance of each score from the mean.

You can use sd() with any numeric vector, including integers, doubles, or the results of calculations. Now, you might know that R treats logical vectors as numeric (TRUE as 1, FALSE as 0), so the sd() function would technically work on logical vectors (you can try it out if you want), but the idea of standard deviation is most meaningful for continuous numeric data.

Handling Missing Values with sd()

Real-world datasets often contain missing values, so this is worth mentioning: If your data includes any NA values, sd() will return NA by default.

Consider this example:

heights <- c(170, 175, NA, 180, 185)
sd(heights)

The result is NA.

To ignore missing values and calculate the standard deviation of the available numbers, use the na.rm = TRUE argument. Or else, you can scrub the NA values from your dataset, but I'd advise that only if it makes sense to do so.

sd(heights, na.rm = TRUE)

Now you’ll get the standard deviation for 170, 175, 180, and 185.

Calculating Standard Deviation for Data Frames in R

You can easily calculate the standard deviation of a specific column in a data frame using the $ operator. Suppose you have a data frame of product weights:

product_data <- data.frame( weight = c(1.2, 1.5, 1.3, 1.7, 1.4), price = c(10, 12, 11, 13, 12) ) 
sd(product_data$weight)

This gives you the standard deviation of the weight column. Remember, sd() is designed for numeric vectors, not entire data frames. Always select the appropriate column.

sd() and Grouped Standard Deviations

Often, you’ll want to measure variability within groups, such as by category, region, or by some other factor. R provides several ways to compute grouped standard deviations. I'll show three:

Using tapply() for grouped standard deviations

Suppose you have sales data for different regions and want the standard deviation for each region:

sales_amount <- c(200, 220, 210, 250, 240, 230) 
region <- c("North", "North", "South", "South", "North", "South") 
tapply(sales_amount, region, sd)

tapply() applies sd() to each group defined by region.

Using aggregate() for grouped standard deviations

If you prefer your results in a tidy data frame, aggregate() works:

sales_data <- data.frame( region = c("North", "North", "South", "South", "North", "South"), amount = c(200, 220, 210, 250, 240, 230) ) 
aggregate(amount ~ region, data = sales_data, sd)

This produces a summary data frame with the standard deviation for each region.

Using dplyr for grouped standard deviations

With the dplyr package, grouped calculations are even more readable. Use summarize() after grouping:

library(dplyr) 

sales_data %>% 
   group_by(region) %>% 
   summarize(sd_amount = sd(amount))

This approach is especially useful for larger datasets or when chaining multiple data transformations. (Or if you simply prefer using the pipe operator, like I do.)

Common Mistakes and Tips

Even a simple function like sd() can cause issues if you’re not careful. Here are some common pitfalls:

Non-numeric data: sd() only works with numeric vectors. If your data includes characters or factors, you’ll get an error. Use is.numeric() to check your data.
Missing values: A single NA will cause sd() to return NA. Always use na.rm = TRUE if you want to ignore missing values.
Small sample sizes: sd() uses n-1 in the denominator (sample standard deviation). For vectors of length 1, sd() returns NA, not 0.

That last error in particular, I think, is interesting. It's a lesser-known difference, and it can trip you up if you are transferring from Python to R, since Python divides by the population standard deviation, not the sample.

R offers several related functions for measuring variability and summarizing data:

var(): Calculates the variance, which is the square of the standard deviation.
apply(), sapply(), lapply(): Useful for applying sd() across rows or columns of matrices and data frames.
mad(): Computes the median absolute deviation, which is another interesting and more robust measure of variability.

Let me show you sd() used with apply(), since I mentioned: Here’s how to use apply() to get the standard deviation of each column in a matrix. Like before, I'll create that matrix on the fly.

measurement_matrix <- matrix(1:9, nrow = 3) 
apply(measurement_matrix, 2, sd)

This returns the standard deviation for each column, giving you a quick overview of variability across multiple variables.

Conclusion

The sd() function in R is relatively simple (it's just one line of code). But if you find yourself using it for reports or presentations and you feel a little foggy on the details, enroll in our Statistician in R career track. There's a lot of nuance in statistics and data analysis, so do make sure that you are well-versed.

Author

Josef Waples

Topics

Learn R with DataCamp

Track

R Programming Fundamentals

22 hr

Level-up your R programming skills! Learn how to work with common data structures, optimize code, and write your own functions.

See Details

Start Course

Course

Introduction to R

4 hr

Master the basics of data analysis in R, including vectors, lists, and data frames, and practice R with real data sets.

See Details

Start Course

Course

Hypothesis Testing in R

4 hr

30.9K

Learn how and when to use hypothesis testing in R, including t-tests, proportion tests, and chi-square tests.

See Details

Start Course

Tutorial

R mean() Function: Get Started with Averages

Calculate the average of numeric, logical, and weighted data using R’s built-in mean functions. Understand how to handle missing values and apply the function to vectors and data frames.

Josef Waples

Tutorial

R median() Function: Find the Middle Value

Learn how to quickly find the middle value of your data using the R median() function. Discover tips for handling missing values and grouping data by categories.

Josef Waples

Tutorial

Sample Standard Deviation: The Key Ideas

Learn how to calculate sample standard deviation and understand its significance in statistical analysis. Explore examples and best practices for real-world data interpretation.

Allan Ouko

Tutorial

How to Calculate Standard Deviation in Excel

To calculate standard deviation in Excel, enter your data into a range of cells and use either =STDEV.S() for sample data or =STDEV.P() for population data.

Arunn Thevapalan

Tutorial

T-tests in R Tutorial: Learn How to Conduct T-Tests

Determine if there is a significant difference between the means of the two groups using t.test() in R.

Abid Ali Awan

Tutorial

R Formula Tutorial

Discover the R formula and how you can use it in modeling- and graphical functions of well-known packages such as stats, and ggplot2.

Karlijn Willems

See More See More

What Does sd() Do in R?

A Simple sd() Example

Handling Missing Values with sd()

Calculating Standard Deviation for Data Frames in R

sd() and Grouped Standard Deviations

Using tapply() for grouped standard deviations

Using aggregate() for grouped standard deviations

Using dplyr for grouped standard deviations

Common Mistakes and Tips

Useful Variations and Related Functions in R

Conclusion

R mean() Function: Get Started with Averages

R median() Function: Find the Middle Value

Sample Standard Deviation: The Key Ideas

How to Calculate Standard Deviation in Excel

T-tests in R Tutorial: Learn How to Conduct T-Tests

R Formula Tutorial

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}R Programming Fundamentals

Introduction to R

Hypothesis Testing in R

R mean() Function: Get Started with Averages

R median() Function: Find the Middle Value

Sample Standard Deviation: The Key Ideas

How to Calculate Standard Deviation in Excel

T-tests in R Tutorial: Learn How to Conduct T-Tests

R Formula Tutorial

R Programming Fundamentals