Sample Standard Deviation: The Key Ideas

Learn how to calculate sample standard deviation and understand its significance in statistical analysis. Explore examples and best practices for real-world data interpretation.

26 sept. 2024 · 6 min lire

The sample standard deviation is a statistical measure used to quantify the variation within a dataset. Specifically, it tells us how much individual data points in a sample differ from the sample mean.

Sample standard deviation is important in inferential statistics if you want to draw conclusions about a given population or, as statisticians might put it, to make inferences from a sample to a population. To understand these distinctions, I recommend taking DataCamp’s Introduction to Statistics and Basic Statistics courses to familiarize yourself with statistics concepts. As a final note, if you are serious about statistics and interested in a career using statistics, check out DataCamp’s guide on How to Become a Statistician.

What is Sample Standard Deviation?

The sample standard deviation is a statistical measure that quantifies the variation or dispersion in a dataset. The sample standard deviation is calculated as the square root of the sample variance, where the sample variance is the sum of the squared differences from the mean, divided by the sample size minus one. We use the sample standard deviation when we want to know how much the data points in a sample differ from the sample mean.

How to Calculate Sample Standard Deviation

We use a specific formula when calculating the sample standard deviation.

Formula for sample standard deviation

Here is the sample standard deviation formula:

Sample standard deviation formula. Image by Author.

Where:

s = sample standard deviation
n = number of data points in the sample
x = individual data point
x̄ = sample mean
n-1 = the correction factor for sample data

Note that when calculating the sample standard deviation, we use n-1 in the denominator to correct the sample bias. This is known as Bessel’s correction. If we were interested in the population standard deviation, we would use n in the denominator instead.

How to calculate sample standard deviation by hand

Let us walk through the following example to see how to calculate the sample standard deviation using the formula. Assume you have a data subset with the following values:

You should follow these steps:

Step 1: Calculate the sample mean

Find the mean of the data points.

Step 2: Find the squared differences from the mean

Subtract the mean and square the result for each data point

Step 3: Sum the squared deviations

Add the squared differences from the mean.

Step 4: Input the calculated variables in the formula

Substitute the values in the formula to get the sample standard deviation.

From the above calculation, we see that the standard deviation for the data is approximately 2.28. Now, if calculating the sample standard deviation by hand is not practical or helpful, and you need to use a tool like Excel, check out our tutorial on How to Calculate Standard Deviation in Excel.

Why Use Sample Standard Deviation?

Sample standard deviation is a preferred measure of variability when an estimate of the population standard deviation is needed based on a sample. Bessel's correction (n−1) in the formula adjusts the bias and improves the estimate's accuracy since the sample may not perfectly represent the population. Making the denominator smaller increases the calculated variance and standard deviation. This compensates for the tendency of smaller samples to underestimate variability.

Importance of sample standard deviation

The sample standard deviation is important in estimating population parameters. Let's take a look.

Calculating confidence intervals

Confidence intervals can be calculated using either the sample standard deviation or the population standard deviation, depending on which is available. If the population standard deviation is unknown, the sample standard deviation is used, and a t-distribution is applied to estimate the range of values likely to contain the population mean. A smaller standard deviation from the sample or population leads to a narrower confidence interval, indicating greater precision in the estimate.

Hypothesis testing

In hypothesis testing, either the sample standard deviation or the population standard deviation can be used, depending on which is known. If the population standard deviation is unknown, the sample standard deviation is applied, and the t-distribution is typically used to evaluate the test statistic. When the population standard deviation is known, the z-distribution is used instead, providing more precision. A smaller standard deviation, whether from the sample or population, results in a more sensitive test, making it easier to detect significant differences.

Let’s take a look at some of the terms that are related to sample standard deviation. Confusing these terms can lead to common mistakes.

Standard deviation vs. variance

Sample standard deviation and variance both measure variability in a dataset. However, they express variability differently. The variance of a dataset measures the average of the squared differences between each data point and the mean. Thus, the variance is expressed in squared units. On the other hand, the standard deviation is calculated as the square root of the variance, so the standard deviation is expressed in the same units as the data.

Sample standard deviation vs. population standard deviation

Let's take a look at two scenarios to help differentiate between sample standard deviation and population standard deviation.

Working with sample data

The sample standard deviation is used when you use a subset of a large population. Imagine you are conducting a customer satisfaction survey for a retail company. You collect responses from about 200 customers to make inferences about the satisfaction of the retail chain's customer base. In this case, you will use the sample standard deviation for inferences since you are working with a subset of a large population.

Working with population data

The population standard deviation is appropriate when you have the complete data for an entire population. In this case, you will not need to account for the bias since every data point would be accounted for. Now, imagine you are the HR of a company with 50 employees and want to calculate the variability of their salaries. You will use the population standard deviation since you know the salary for each employee.

Sample Standard Deviation in Python and R

When using R to calculate standard deviation, the sd() function computes the sample standard deviation by default with n−1 in the denominator.

# Sample standard deviation
data <- c(10, 12, 15, 18, 20)
sample_sd <- sd(data)
print(round(sample_sd, 2))

4.12

However, Python packages like NumPy and Pandas assume population standard deviation unless otherwise specified. For example, numpy.std() uses n in the denominator by default, treating the data as a population.

# Import numpy library
import numpy as np
data = [10, 12, 15, 18, 20]
population_sd = np.std(data)  # Population standard deviation
print(round(population_sd, 2))

3.69

If you want the sample standard deviation instead, you can pass ddof=1 to adjust for degrees of freedom.

sample_sd = np.std(data, ddof=1)  # Sample standard deviation
print(round(sample_sd, 2))

4.12

If you are working with statistics in a programming environment, I would recommend our Statistical Inference with R skill track or the Foundations of Inference in Python course to develop a core set of skills.

Conclusion

Sample standard deviation is an important statistical measure for quantifying variation and making inferences. If you want to advance your knowledge in statistics and data analysis, I recommend checking out DataCamp’s Data Analyst with R and Data Analyst with Python career tracks, which offer comprehensive, industry-relevant skills. Our Data Analysis in Excel course is also suitable if you want to become a business analyst focused on making data-driven decisions.

Author

Allan Ouko

What is the difference between sample and population standard deviation?

Why do we use n−1 instead of n when calculating sample standard deviation?

How is sample standard deviation different from variance?

Can the sample standard deviation be negative?

What types of data are appropriate for calculating standard deviation?

Sujets

Data Analysis

Data Science

Learn Statistics with DataCamp

Cours

Statistical Thinking in Python (Part 1)

3 h

185.5K

Build the foundation you need to think statistically and to speak the language of your data.

Afficher les détails

Commencer le cours

Cours

Introduction aux statistiques en R

4 h

124.3K

Renforcez vos compétences en statistique : collectez, analysez et tirez des conclusions fiables à partir de vos données.

Afficher les détails

Commencer le cours

Cours

Introduction aux statistiques

4 h

136.3K

Explorez les bases de la statistique : centre, dispersion, lois de probabilité et tests d’hypothèse, sans écrire une seule ligne de code.

Afficher les détails

Commencer le cours

Contenus associés

blog

Data Demystified: An Overview of Descriptive Statistics

In the fifth entry of data demystified, we provide an overview of the basics of descriptive statistics, one of the fundamental areas of data science.

Richie Cotton

6 min

cheat-sheet

Descriptive Statistics Cheat Sheet

In this descriptive statistics cheat sheet, you'll learn about the most common statistical techniques for descriptive analytics.

Richie Cotton

Tutoriel

How to Calculate Standard Deviation in Excel

To calculate standard deviation in Excel, enter your data into a range of cells and use either =STDEV.S() for sample data or =STDEV.P() for population data.

Arunn Thevapalan

Tutoriel

How to Calculate the Coefficient of Variation in Excel

Calculating the coefficient of variation in Excel is a straightforward process: we divide STDEV.P by AVERAGE.

Arunn Thevapalan

Tutoriel

Understanding Skewness And Kurtosis And How to Plot Them

A comprehensive visual guide into skewness/kurtosis and how they effect distributions and ultimately, your data science project.

Bex Tuychiev

Tutoriel

Simple Linear Regression: Everything You Need to Know

Learn simple linear regression. Master the model equation, understand key assumptions and diagnostics, and learn how to interpret the results effectively.

Josef Waples

Voir plus Voir plus

What is Sample Standard Deviation?

How to Calculate Sample Standard Deviation

Formula for sample standard deviation

How to calculate sample standard deviation by hand

Step 1: Calculate the sample mean

Step 2: Find the squared differences from the mean

Step 3: Sum the squared deviations

Step 4: Input the calculated variables in the formula

Why Use Sample Standard Deviation?

Importance of sample standard deviation

Calculating confidence intervals

Hypothesis testing

Sample Standard Deviation and Related Terms

Standard deviation vs. variance

Sample standard deviation vs. population standard deviation

Working with sample data

Working with population data

Sample Standard Deviation in Python and R

Conclusion

Frequently Asked Questions

How is sample standard deviation different from variance?

Can the sample standard deviation be negative?

What types of data are appropriate for calculating standard deviation?

Data Demystified: An Overview of Descriptive Statistics

Descriptive Statistics Cheat Sheet

How to Calculate Standard Deviation in Excel

How to Calculate the Coefficient of Variation in Excel

Understanding Skewness And Kurtosis And How to Plot Them

Simple Linear Regression: Everything You Need to Know

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Statistical Thinking in Python (Part 1)

Introduction aux statistiques en R

Introduction aux statistiques

Data Demystified: An Overview of Descriptive Statistics

Descriptive Statistics Cheat Sheet

How to Calculate Standard Deviation in Excel

How to Calculate the Coefficient of Variation in Excel

Understanding Skewness And Kurtosis And How to Plot Them

Simple Linear Regression: Everything You Need to Know

Statistical Thinking in Python (Part 1)