Skip to main content
HomeTutorialsData Analysis

Sample Standard Deviation: The Key Ideas

Learn how to calculate sample standard deviation and understand its significance in statistical analysis. Explore examples and best practices for real-world data interpretation.
Sep 26, 2024  · 6 min read

The sample standard deviation is a statistical measure used to quantify the variation within a dataset. Specifically, it tells us how much individual data points in a sample differ from the sample mean.

Sample standard deviation is important in inferential statistics if you want to draw conclusions about a given population or, as statisticians might put it, to make inferences from a sample to a population. To understand these distinctions, I recommend taking DataCamp’s Introduction to Statistics and Basic Statistics courses to familiarize yourself with statistics concepts. As a final note, if you are serious about statistics and interested in a career using statistics, check out DataCamp’s guide on How to Become a Statistician.

What is Sample Standard Deviation?

The sample standard deviation is a statistical measure that quantifies the variation or dispersion in a dataset. The sample standard deviation is calculated as the square root of the sample variance, where the sample variance is the sum of the squared differences from the mean, divided by the sample size minus one. We use the sample standard deviation when we want to know how much the data points in a sample differ from the sample mean.

How to Calculate Sample Standard Deviation

We use a specific formula when calculating the sample standard deviation.

Formula for sample standard deviation

Here is the sample standard deviation formula:

Sample standard deviation formula.

Sample standard deviation formula. Image by Author.

Where:

  • s = sample standard deviation
  • n = number of data points in the sample
  • x = individual data point
  • x̄ = sample mean
  • n-1 = the correction factor for sample data

Note that when calculating the sample standard deviation, we use n-1 in the denominator to correct the sample bias. This is known as Bessel’s correction. If we were interested in the population standard deviation, we would use n in the denominator instead. 

How to calculate sample standard deviation by hand

Let us walk through the following example to see how to calculate the sample standard deviation using the formula. Assume you have a data subset with the following values:

You should follow these steps:

Step 1: Calculate the sample mean

Find the mean of the data points.

Step 2: Find the squared differences from the mean

Subtract the mean and square the result for each data point

Step 3: Sum the squared deviations

Add the squared differences from the mean.

Step 4: Input the calculated variables in the formula

Substitute the values in the formula to get the sample standard deviation.

From the above calculation, we see that the standard deviation for the data is approximately 2.28. Now, if calculating the sample standard deviation by hand is not practical or helpful, and you need to use a tool like Excel, check out our tutorial on How to Calculate Standard Deviation in Excel.

Why Use Sample Standard Deviation?

Sample standard deviation is a preferred measure of variability when an estimate of the population standard deviation is needed based on a sample. Bessel's correction (n−1) in the formula adjusts the bias and improves the estimate's accuracy since the sample may not perfectly represent the population. Making the denominator smaller increases the calculated variance and standard deviation. This compensates for the tendency of smaller samples to underestimate variability.

Importance of sample standard deviation

The sample standard deviation is important in estimating population parameters. Let's take a look.

Calculating confidence intervals 

Confidence intervals can be calculated using either the sample standard deviation or the population standard deviation, depending on which is available. If the population standard deviation is unknown, the sample standard deviation is used, and a t-distribution is applied to estimate the range of values likely to contain the population mean. A smaller standard deviation from the sample or population leads to a narrower confidence interval, indicating greater precision in the estimate.

Hypothesis testing 

In hypothesis testing, either the sample standard deviation or the population standard deviation can be used, depending on which is known. If the population standard deviation is unknown, the sample standard deviation is applied, and the t-distribution is typically used to evaluate the test statistic. When the population standard deviation is known, the z-distribution is used instead, providing more precision. A smaller standard deviation, whether from the sample or population, results in a more sensitive test, making it easier to detect significant differences.

Let’s take a look at some of the terms that are related to sample standard deviation. Confusing these terms can lead to common mistakes. 

Standard deviation vs. variance

Sample standard deviation and variance both measure variability in a dataset. However, they express variability differently. The variance of a dataset measures the average of the squared differences between each data point and the mean. Thus, the variance is expressed in squared units. On the other hand, the standard deviation is calculated as the square root of the variance, so the standard deviation is expressed in the same units as the data.

Sample standard deviation vs. population standard deviation

Let's take a look at two scenarios to help differentiate between sample standard deviation and population standard deviation.

Working with sample data

The sample standard deviation is used when you use a subset of a large population. Imagine you are conducting a customer satisfaction survey for a retail company. You collect responses from about 200 customers to make inferences about the satisfaction of the retail chain's customer base. In this case, you will use the sample standard deviation for inferences since you are working with a subset of a large population.

Working with population data

The population standard deviation is appropriate when you have the complete data for an entire population. In this case, you will not need to account for the bias since every data point would be accounted for. Now, imagine you are the HR of a company with 50 employees and want to calculate the variability of their salaries. You will use the population standard deviation since you know the salary for each employee.

Sample Standard Deviation in Python and R

When using R to calculate standard deviation, the sd() function computes the sample standard deviation by default with n−1 in the denominator. 

# Sample standard deviation
data <- c(10, 12, 15, 18, 20)
sample_sd <- sd(data)
print(round(sample_sd, 2))
4.12

However, Python packages like NumPy and Pandas assume population standard deviation unless otherwise specified. For example, numpy.std() uses n in the denominator by default, treating the data as a population. 

# Import numpy library
import numpy as np
data = [10, 12, 15, 18, 20]
population_sd = np.std(data)  # Population standard deviation
print(round(population_sd, 2))
3.69

If you want the sample standard deviation instead, you can pass ddof=1 to adjust for degrees of freedom.

sample_sd = np.std(data, ddof=1)  # Sample standard deviation
print(round(sample_sd, 2))
4.12

If you are working with statistics in a programming environment, I would recommend our Statistical Inference with R skill track or the Foundations of Inference in Python course to develop a core set of skills. 

Conclusion

Sample standard deviation is an important statistical measure for quantifying variation and making inferences. If you want to advance your knowledge in statistics and data analysis, I recommend checking out DataCamp’s Data Analyst with R and Data Analyst with Python career tracks, which offer comprehensive, industry-relevant skills. Our Data Analysis in Excel course is also suitable if you want to become a business analyst focused on making data-driven decisions.


Photo of Allan Ouko
Author
Allan Ouko
LinkedIn
I create articles that simplify data science and analytics, making them easy to understand and accessible.

Frequently Asked Questions

What is the difference between sample and population standard deviation?

Sample standard deviation is used when calculating variability from a subset of data to make inferences about the population. Population standard deviation is used when you have data for every member of the population.

Why do we use n−1 instead of n when calculating sample standard deviation?

We use n−1 to account for the fact that we're working with a sample, not the entire population. This adjustment helps reduce bias and provides a more accurate estimate of the population standard deviation.

How is sample standard deviation different from variance?

Variance is the average of the squared differences from the mean, expressed in squared units. Standard deviation is the square root of the variance and is expressed in the same units as the data.

Can the sample standard deviation be negative?

No, sample standard deviation cannot be negative because it is the square root of the variance, which is always a non-negative number.

What types of data are appropriate for calculating standard deviation?

Standard deviation is suitable for continuous numerical data that follows a normal distribution. It is not appropriate for categorical data or heavily skewed distributions.

Topics

Learn Statistics with DataCamp

Course

Introduction to Statistics

4 hr
77.6K
Learn the fundamentals of statistics, including measures of center and spread, probability distributions, and hypothesis testing with no coding involved!
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related
An Overview of Descriptive Statistics

blog

Data Demystified: An Overview of Descriptive Statistics

In the fifth entry of data demystified, we provide an overview of the basics of descriptive statistics, one of the fundamental areas of data science.
Richie Cotton's photo

Richie Cotton

6 min

cheat-sheet

Descriptive Statistics Cheat Sheet

In this descriptive statistics cheat sheet, you'll learn about the most common statistical techniques for descriptive analytics.
Richie Cotton's photo

Richie Cotton

5 min

tutorial

How to Calculate Standard Deviation in Excel

To calculate standard deviation in Excel, enter your data into a range of cells and use either =STDEV.S() for sample data or =STDEV.P() for population data.
Arunn Thevapalan's photo

Arunn Thevapalan

10 min

tutorial

How to Calculate the Coefficient of Variation in Excel

Calculating the coefficient of variation in Excel is a straightforward process: we divide STDEV.P by AVERAGE.
Arunn Thevapalan's photo

Arunn Thevapalan

8 min

tutorial

Understanding Skewness And Kurtosis And How to Plot Them

A comprehensive visual guide into skewness/kurtosis and how they effect distributions and ultimately, your data science project.
Bex Tuychiev's photo

Bex Tuychiev

10 min

tutorial

Simple Linear Regression: Everything You Need to Know

Learn simple linear regression. Master the model equation, understand key assumptions and diagnostics, and learn how to interpret the results effectively.
Josef Waples's photo

Josef Waples

7 min

See MoreSee More