Lewati ke konten utama

Normality Test: How to Check If Your Data Is Normally Distributed

Learn what a normality test is, why it matters, and how to use common tests like Shapiro-Wilk, Kolmogorov-Smirnov, and visual methods to check your data + examples in Python and R.
19 Mar 2026  · 14 mnt baca

Have you ever run a t-test, got a clean p-value, and then realized you never checked if your data was normally distributed?

Statistical tests don't tell you when their assumptions are violated. They just report back the value. The problem is that tests like t-tests and ANOVA assume your data follows a normal distribution. If that’s not the case, you're building conclusions on shaky foundations.

Normality tests give you a way to verify that assumption. There are both visual and statistical methods to do this, and knowing which to use - and how to read the results - is what allows you to confidently stand behind your results.

In this article, I'll walk you through the most common visual and statistical methods for checking normality, show you how to run them in Python and R, and explain what to do when your data doesn't pass the test.

What Is Normal Distribution in Practice

You've probably seen the bell curve before - but here's what it actually means for your data.

A normal distribution is a pattern where most values cluster around the center, and fewer values appear as you move further out in either direction. Plot it, and you get a symmetric, bell-shaped curve. The left side mirrors the right side.

Normal distribution plot

Normal distribution plot

What makes normal distribution unique is that the mean, median, and mode all land on the same point - the center of the bell. There's no skew to the left or right. In other words, the data is balanced.

This shows up constantly in real-world measurement data. Human height, blood pressure readings, manufacturing tolerances, test scores - these all tend to follow a normal distribution when you collect enough samples. Natural variation in biological and physical systems tends to produce this shape.

That said, not all data behaves this way. Income data skews right. Website response times have long tails.

In the real world, things can go horribly wrong if you assume normality without checking.

Why Testing for Normality Matters

The problem with not checking for normality is that most common statistical tests - t-tests, ANOVA - are parametric tests.

That means they're built on assumptions about your data's distribution. Normality is one of them. When that assumption breaks, the test's math breaks with it. You’ll still get the result from the test, but it might lead you to wrong conclusions.

Parametric tests work by making mathematical assumptions about the population your sample comes from. When those assumptions hold, these tests are useful and accurate. When they don't, your p-values become unreliable and you can’t make accurate conclusions.

That's where non-parametric tests come in.

Tests like Mann-Whitney U or Kruskal-Wallis don't assume normality - they work with ranks instead of raw values. They're more flexible, but they also tend to be less useful when your data is normal. So switching to them unnecessarily isn't the answer.

The real issue so many newcomers to data science make is entirely skipping the check.

Normality testing takes a few lines of code. Not testing means you're either trusting your data - or not thinking about it at all.

Visual Methods to Check Normality

Before running any formal test, plot your data. Visuals will tell you a lot about what you're working with.

Histogram

A histogram shows you the shape of your distribution.

Example histogram

Example histogram

If your data is normally distributed, the histogram should look like a bell curve - tall in the middle, tapering symmetrically on both sides. What you're watching for is skewness: a long tail pulling to the right means positive skew, a tail pulling left means negative skew. Either way, that's a sign your data might not be normal.

The problem with histograms is that its shape depends on bin size:

  • Too few bins and the distribution looks flat
  • Too many and it looks jagged

Always try a couple of bin sizes before drawing conclusions.

Q-Q plot

A Q-Q plot (quantile-quantile plot) compares your data's quantiles against the quantiles of a theoretical normal distribution.

Example Q-Q plot

Example Q-Q plot

If your data is normal, the points fall along a straight diagonal line. Deviations from that line tell you where normality breaks down. Points curving upward at the ends suggest heavy tails. An S-shaped curve points to skewness.

Q-Q plots are more precise than histograms for spotting subtle departures from normality - especially in the tails, where histograms tend to miss things.

Box plot

A box plot shows you the median, spread, and outliers in one view.

Example box plot

A normally distributed dataset produces a box plot where the median sits roughly in the center of the box, and the whiskers extend to about equal lengths on both sides. If the median is off-center, or one whisker is much longer than the other. That's skew. Dots outside the whiskers are outliers.

The general issue with visuals is that they're subjective. Two people can look at the same histogram and disagree. Use them to get a feel for your data first, then confirm with a formal test.

Common Normality Tests in Statistics

There's no single normality test that works best in every situation. The right one depends on your sample size and what you're trying to detect.

Shapiro-Wilk test

The Shapiro-Wilk test is the go-to choice for small to medium samples, generally up to a few hundred observations.

It measures how closely your data matches a normal distribution by comparing the observed values against what you'd expect if the data were normal. It's widely used, well-understood, and available in every major stats library. For most analysts, this is the first test to reach for.

Its main limitation is that it becomes oversensitive at large sample sizes. It tends to flag tiny, practically meaningless deviations as statistically significant.

Kolmogorov-Smirnov test

The Kolmogorov-Smirnov (KS) test compares your sample's cumulative distribution against a theoretical one - in this case, normal.

It's more general than Shapiro-Wilk and can test against any distribution, not just normal. The KS test is less powerful than Shapiro-Wilk for normality testing, meaning it's less likely to catch subtle departures. It also requires you to specify the distribution parameters upfront, which introduces bias if you estimate them from the same data.

Use it when you need a quick, general-purpose check - not as your primary normality test.

Anderson-Darling test

The Anderson-Darling test is a variation of the KS test, but with one key difference: it gives more weight to the tails of the distribution.

This makes it better at catching deviations that show up at the extremes - heavy tails, outliers, or non-normal behavior that the KS test would miss. If your use case is sensitive to tail behavior, Anderson-Darling is a good choice.

D'Agostino-Pearson test

The D'Agostino-Pearson test takes a different approach.

Instead of comparing distributions directly, it measures two properties of your data: skewness (asymmetry) and kurtosis (how heavy or light the tails are).

It combines both into a single test statistic. This makes it good at pinpointing why your data might not be normal - not just whether it is. It works best with larger samples, where skewness and kurtosis estimates are reliable.

Jarque-Bera test

The Jarque-Bera test also uses skewness and kurtosis, similar to D'Agostino-Pearson.

It's common in econometrics and time series analysis. Like D'Agostino-Pearson, it needs a reasonably large sample to produce reliable results. With small samples, the test isn’t too reliable. If you're working in a finance or economics context, you'll likely see this one often.

To conclude, start with Shapiro-Wilk for small samples and pair it with a Q-Q plot. Use Anderson-Darling when tail behavior matters, and D'Agostino-Pearson when you want to understand the nature of the deviation.

How to Interpret Normality Test Results

Every normality test is a hypothesis test.

The null hypothesis in any normality test is that your data is normally distributed. The test then asks: given what we see in the data, how likely is this null hypothesis to be true?

The answer comes back as a p-value:

  • p > 0.05 - you don't have enough evidence to reject normality. Assume the data is normal and proceed with parametric tests
  • p < 0.05 - the data differs from normality enough to be statistically detectable. Reject the normality assumption

Sounds simple, but many people go wrong here.

A low p-value doesn't tell you how non-normal your data is - only that a difference was detected. With large samples, normality tests become extremely sensitive. They'll flag deviations so small they have no real impact on your analysis.

The opposite problem also exists. With small samples, even visibly skewed data can produce p > 0.05 because the test doesn't have enough power to detect the deviation.

Statistical significance and practical significance aren't the same thing.

A p-value tells you whether a departure from normality exists. It doesn't tell you whether that departure matters for your specific analysis. Always pair your test result with a Q-Q plot - if the points follow the line closely, your data is probably normal enough, regardless of what the p-value says.

Normality Tests in Python

Python's scipy.stats module has everything you need to run normality tests in a few lines of code.

For all examples below, I'll use the same dataset - 100 samples drawn from a normal distribution - so you can run the code and follow along.

import numpy as np
from scipy import stats

np.random.seed(42)
data = np.random.normal(loc=0, scale=1, size=100)

Shapiro-Wilk test

Use shapiro() as your first check, especially with smaller datasets.

stat, p_value = stats.shapiro(data)
print(f"Statistic: {stat:.4f}, p-value: {p_value:.4f}")

This is what you get out:

Output of a Shapiro-Wilk’s test in Python

Output of a Shapiro-Wilk’s test in Python

The p-value is well above 0.05, so we don't reject normality. The data looks normal - which makes sense, since we generated it from a normal distribution.

Kolmogorov-Smirnov test

kstest() compares your sample against a named distribution. For normality, pass "norm" along with the sample's mean and standard deviation.

stat, p_value = stats.kstest(data, 'norm', args=(data.mean(), data.std()))
print(f"Statistic: {stat:.4f}, p-value: {p_value:.4f}")

Output of a Kolmogorov-Smirnov test in Python

Output of a Kolmogorov-Smirnov test in Python

Again, p > 0.05 - no evidence against normality.

With this test in Python, always pass the mean and standard deviation explicitly via args. If you skip that, kstest() defaults to a standard normal (mean=0, std=1), which will give you unreliable results unless your data is already standardized.

D'Agostino-Pearson test

normaltest() tests for normality by checking skewness and kurtosis combined. It works best with larger samples.

stat, p_value = stats.normaltest(data)
print(f"Statistic: {stat:.4f}, p-value: {p_value:.4f}")

Output of a D'Agostino-Pearson test in Python

Output of a D'Agostino-Pearson test in Python

p > 0.05 again. The data passes all three tests here, but that's expected - I generated it to be normal. In practice, you'll often see these tests disagree, especially near the 0.05 boundary. When that happens, fall back on your Q-Q plot to make the call.

Normality Tests in R

R has built-in functions for normality testing. It requires no extra packages needed for the basics.

As with the Python examples, I'll use the same dataset throughout: 100 samples from a normal distribution.

set.seed(42)
data <- rnorm(100, mean = 0, sd = 1)

Shapiro-Wilk test

shapiro.test() is the go-to for small to medium samples. Just pass it your vector of data:

shapiro.test(data)

Output of a Shapiro-Wilk’s test in R

Output of a Shapiro-Wilk’s test in R

p > 0.05 - no evidence against normality. The W statistic ranges from 0 to 1, where values close to 1 indicate the data closely follows a normal distribution.

Kolmogorov-Smirnov test

ks.test() compares your sample against a theoretical distribution. For normality, specify "pnorm" and pass the sample mean and standard deviation.

ks.test(data, "pnorm", mean(data), sd(data))

Output of a Kolmogorov-Smirnov test in R

Output of a Kolmogorov-Smirnov test in R

p > 0.05 again. This test in R has the same caveat as in Python: always pass mean(data) and sd(data). Skipping it would default to a standard normal, which skews the result unless your data is already standardized.

Q-Q plot

R's built-in qqnorm() and qqline() give you a Q-Q plot in two lines of code.

qqnorm(data, main = "Q-Q Plot")
qqline(data, col = "steelblue", lwd = 2)

Q-Q plot in R

Q-Q plot in R

qqnorm() plots your sample quantiles against theoretical normal quantiles. qqline() draws the reference line. Points following that line closely mean your data is behaving normally. Deviations at the ends signal tail issues worth investigating.

What to Do If Data Is Not Normal

If your data fails a normality test, you have a couple of solid options.

Transform the data

Sometimes the fix is to transform your data so it behaves normally, then run your original tests on the transformed values.

Log transformation is the most common choice. It works well on right-skewed data - think income, response times, or biological measurements that have a long tail on the right side. The function in Python is np.log(data), and the R equivalent is log(data).

Square root transformation is a milder option for moderate skew, and it's handy when your data contains zeros (since you can't take the log of zero). Use np.sqrt(data) in Python or sqrt(data) in R.

After transforming, re-run your normality test. If the transformed data passes, proceed with your parametric tests - just remember to interpret results in terms of the transformed scale.

Use non-parametric tests

If transformation doesn't work or doesn't make sense for your data, switch to non-parametric tests. These don't assume normality - they rank the data instead of working with raw values.

  • Mann-Whitney U test is the non-parametric alternative to the independent samples t-test. Use it when you're comparing two groups
  • Kruskal-Wallis test is the non-parametric version of one-way ANOVA. Use it when you're comparing three or more groups

Both are available in scipy.stats (mannwhitneyu() and kruskal()) and in R's base package (wilcox.test() and kruskal.test()).

Rely on large sample sizes

With large enough samples, you can often skip the normality concern.

The central limit theorem says that as your sample size grows, the sampling distribution of the mean approaches normal - regardless of how the original data is distributed. In practice, this means parametric tests tend to be reliable with large samples even when the underlying data isn't perfectly normal.

Common Mistakes When Testing for Normality

Normality testing is easy - you’ve seen that it only takes one line of code. Still, there are a couple of ways to get it wrong.

Here are some common mistakes newcomers to data science often make:

  • Relying only on p-values: A p-value tells you whether a departure from normality was detected, not how large that departure is or whether it matters. Treating p > 0.05 as a green light and p < 0.05 as a red light is too blunt. Always pair your test result with a Q-Q plot
  • Ignoring sample size effects: With small samples, normality tests can miss real departures and return p > 0.05 even when your data is visibly skewed. With large samples, the test becomes so sensitive it flags tiny, meaningless deviations as statistically significant. Sample size can change what p-value means
  • Over-testing normality: Not every analysis needs a formal normality test. If you're doing exploratory work, a histogram and Q-Q plot are usually enough
  • Misinterpreting slight deviations: Real-world data is almost never perfectly normal. A minor departure from the reference line on a Q-Q plot, or a p-value sitting just under 0.05, doesn't mean your data is far from normal. The question is whether it's normal enough for the test you're running

So to conclude, normality testing is just a single check of your data. Use it as one input among many, not as the final word.

When You Can Skip Normality Testing

Normality testing isn't always necessary. If you’re under a deadline, knowing when to skip it can save you time without affecting results.

Large datasets

When you have a large sample, the central limit theorem guarantees that the sampling distribution of the mean is approximately normal, regardless of the shape of your raw data. Parametric tests are generally reliable in this situation, so running a formal normality test adds little value.

Some statistical methods are also robust to non-normality. Techniques like linear regression tend to hold up well when sample sizes are reasonable, and violations aren't extreme. (Linear regression still assumes normality in the residuals.)

Exploratory analysis

When you're scanning data for patterns, building intuition, or deciding which variables to investigate further, a quick histogram or Q-Q plot is enough. Formal tests are for confirmatory analysis - when your conclusions need to hold up.

Remember that normality testing exists to protect you from drawing wrong conclusions. If you're in a context where a wrong conclusion doesn't carry real consequences, or where your method doesn't depend on normality, the test is optional.

Conclusion

Normality testing is all about checking if your assumptions hold well enough to trust your results.

No dataset is perfectly normal. The goal is to understand how your data behaves and choose your methods accordingly. A Q-Q plot tells you where the deviations are. A formal test informs you whether they're statistically detectable. When combined, they give you a clearer picture than either one alone.

The right test depends on your context. Use Shapiro-Wilk for small samples, Anderson-Darling when tails matter, non-parametric alternatives when normality can't be assumed. And sometimes - with large samples or robust methods - no test at all.

Do you find the entire concept of p-values confusing? Read our Hypothesis Testing Made Easy article to make sure you’re interpreting them correctly.


Dario Radečić's photo
Author
Dario Radečić
LinkedIn
Senior Data Scientist based in Croatia. Top Tech Writer with over 700 articles published, generating more than 10M views. Book Author of Machine Learning Automation with TPOT.

Normality Test FAQs

What is a normality test?

A normality test is a statistical method that checks whether your data follows a normal (Gaussian) distribution. Most common statistical tests - like t-tests, ANOVA, and linear regression - assume normality, so checking this assumption before running your analysis helps you avoid drawing incorrect conclusions.

Do I always need to test for normality?

Not always. With large samples, the central limit theorem makes parametric tests reliable regardless of the underlying distribution. For exploratory analysis, a quick histogram or Q-Q plot is usually enough - formal normality tests are most useful when you're doing confirmatory analysis, and your conclusions need to hold up.

What should I do if my data fails a normality test?

You have a couple of options. You can transform the data using a log or square root transformation, then re-test. If the transformation doesn't work, switch to non-parametric tests like Mann-Whitney U (for two groups) or Kruskal-Wallis (for three or more groups), which don't assume normality.

What's the difference between the Shapiro-Wilk and Kolmogorov-Smirnov tests?

Shapiro-Wilk is designed specifically for normality testing and works best with small to medium samples. The Kolmogorov-Smirnov test is more general - it can compare a sample against any theoretical distribution, not just normal - but it's less powerful than Shapiro-Wilk for normality testing specifically, making it more likely to miss subtle departures.

How do I interpret a Q-Q plot for normality?

A Q-Q plot compares your data's quantiles against the quantiles of a theoretical normal distribution. If the points fall close to the diagonal reference line, your data is behaving normally. Deviations at the ends of the line signal tail issues - an S-shaped curve points to skewness, while points curving away from the line at both ends suggest heavier or lighter tails than a normal distribution would have.

Topik

Learn with DataCamp

Kursus

Memahami Ilmu Data

2 Hr
832K
Lihat DetailRight Arrow
Mulai Kursus
Lihat Lebih BanyakRight Arrow
Terkait

blogs

The Standard Normal Distribution: What It Is and Why It Matters

Discover the fundamentals of the standard normal distribution and its significance in statistics, data science, and machine learning. Learn how to apply this concept to real-world data analysis.
Josef Waples's photo

Josef Waples

10 mnt

Tutorials

The Q-Q Plot: What It Means and How to Interpret It

Discover how Q-Q plots are a useful visual method to assess normality. Compare observed data to a theoretical distribution like the normal distribution to highlight deviations. Learn to diagnose model fit.
Josef Waples's photo

Josef Waples

Tutorials

The T-Distribution: A Key Tool for Small Sample Inference

Understand how the t-distribution helps when sample sizes are small or population variance is unknown. Compare it to the normal and Z-distributions to learn when each is appropriate.
Vidhi Chugh's photo

Vidhi Chugh

Tutorials

Sample Standard Deviation: The Key Ideas

Learn how to calculate sample standard deviation and understand its significance in statistical analysis. Explore examples and best practices for real-world data interpretation.
Allan Ouko's photo

Allan Ouko

Tutorials

Understanding Skewness And Kurtosis And How to Plot Them

A comprehensive visual guide into skewness/kurtosis and how they effect distributions and ultimately, your data science project.
Bex Tuychiev's photo

Bex Tuychiev

Tutorials

An Introduction to Python T-Tests

Learn how to perform t-tests in Python with this tutorial. Understand the different types of t-tests - one-sample test, two-sample test, paired t-test, and Welch’s test, and when to use them.
Vidhi Chugh's photo

Vidhi Chugh

Lihat Lebih BanyakLihat Lebih Banyak