Curso
Have you ever run a t-test, got a clean p-value, and then realized you never checked if your data was normally distributed?
Statistical tests don't tell you when their assumptions are violated. They just report back the value. The problem is that tests like t-tests and ANOVA assume your data follows a normal distribution. If that’s not the case, you're building conclusions on shaky foundations.
Normality tests give you a way to verify that assumption. There are both visual and statistical methods to do this, and knowing which to use - and how to read the results - is what allows you to confidently stand behind your results.
In this article, I'll walk you through the most common visual and statistical methods for checking normality, show you how to run them in Python and R, and explain what to do when your data doesn't pass the test.
What Is Normal Distribution in Practice
You've probably seen the bell curve before - but here's what it actually means for your data.
A normal distribution is a pattern where most values cluster around the center, and fewer values appear as you move further out in either direction. Plot it, and you get a symmetric, bell-shaped curve. The left side mirrors the right side.

Normal distribution plot
What makes normal distribution unique is that the mean, median, and mode all land on the same point - the center of the bell. There's no skew to the left or right. In other words, the data is balanced.
This shows up constantly in real-world measurement data. Human height, blood pressure readings, manufacturing tolerances, test scores - these all tend to follow a normal distribution when you collect enough samples. Natural variation in biological and physical systems tends to produce this shape.
That said, not all data behaves this way. Income data skews right. Website response times have long tails.
In the real world, things can go horribly wrong if you assume normality without checking.
Why Testing for Normality Matters
The problem with not checking for normality is that most common statistical tests - t-tests, ANOVA - are parametric tests.
That means they're built on assumptions about your data's distribution. Normality is one of them. When that assumption breaks, the test's math breaks with it. You’ll still get the result from the test, but it might lead you to wrong conclusions.
Parametric tests work by making mathematical assumptions about the population your sample comes from. When those assumptions hold, these tests are useful and accurate. When they don't, your p-values become unreliable and you can’t make accurate conclusions.
That's where non-parametric tests come in.
Tests like Mann-Whitney U or Kruskal-Wallis don't assume normality - they work with ranks instead of raw values. They're more flexible, but they also tend to be less useful when your data is normal. So switching to them unnecessarily isn't the answer.
The real issue so many newcomers to data science make is entirely skipping the check.
Normality testing takes a few lines of code. Not testing means you're either trusting your data - or not thinking about it at all.
Visual Methods to Check Normality
Before running any formal test, plot your data. Visuals will tell you a lot about what you're working with.
Histogram
A histogram shows you the shape of your distribution.

Example histogram
If your data is normally distributed, the histogram should look like a bell curve - tall in the middle, tapering symmetrically on both sides. What you're watching for is skewness: a long tail pulling to the right means positive skew, a tail pulling left means negative skew. Either way, that's a sign your data might not be normal.
The problem with histograms is that its shape depends on bin size:
- Too few bins and the distribution looks flat
- Too many and it looks jagged
Always try a couple of bin sizes before drawing conclusions.
Q-Q plot
A Q-Q plot (quantile-quantile plot) compares your data's quantiles against the quantiles of a theoretical normal distribution.

Example Q-Q plot
If your data is normal, the points fall along a straight diagonal line. Deviations from that line tell you where normality breaks down. Points curving upward at the ends suggest heavy tails. An S-shaped curve points to skewness.
Q-Q plots are more precise than histograms for spotting subtle departures from normality - especially in the tails, where histograms tend to miss things.
Box plot
A box plot shows you the median, spread, and outliers in one view.

Example box plot
A normally distributed dataset produces a box plot where the median sits roughly in the center of the box, and the whiskers extend to about equal lengths on both sides. If the median is off-center, or one whisker is much longer than the other. That's skew. Dots outside the whiskers are outliers.
The general issue with visuals is that they're subjective. Two people can look at the same histogram and disagree. Use them to get a feel for your data first, then confirm with a formal test.
Common Normality Tests in Statistics
There's no single normality test that works best in every situation. The right one depends on your sample size and what you're trying to detect.
Shapiro-Wilk test
The Shapiro-Wilk test is the go-to choice for small to medium samples, generally up to a few hundred observations.
It measures how closely your data matches a normal distribution by comparing the observed values against what you'd expect if the data were normal. It's widely used, well-understood, and available in every major stats library. For most analysts, this is the first test to reach for.
Its main limitation is that it becomes oversensitive at large sample sizes. It tends to flag tiny, practically meaningless deviations as statistically significant.
Kolmogorov-Smirnov test
The Kolmogorov-Smirnov (KS) test compares your sample's cumulative distribution against a theoretical one - in this case, normal.
It's more general than Shapiro-Wilk and can test against any distribution, not just normal. The KS test is less powerful than Shapiro-Wilk for normality testing, meaning it's less likely to catch subtle departures. It also requires you to specify the distribution parameters upfront, which introduces bias if you estimate them from the same data.
Use it when you need a quick, general-purpose check - not as your primary normality test.
Anderson-Darling test
The Anderson-Darling test is a variation of the KS test, but with one key difference: it gives more weight to the tails of the distribution.
This makes it better at catching deviations that show up at the extremes - heavy tails, outliers, or non-normal behavior that the KS test would miss. If your use case is sensitive to tail behavior, Anderson-Darling is a good choice.
D'Agostino-Pearson test
The D'Agostino-Pearson test takes a different approach.
Instead of comparing distributions directly, it measures two properties of your data: skewness (asymmetry) and kurtosis (how heavy or light the tails are).
It combines both into a single test statistic. This makes it good at pinpointing why your data might not be normal - not just whether it is. It works best with larger samples, where skewness and kurtosis estimates are reliable.
Jarque-Bera test
The Jarque-Bera test also uses skewness and kurtosis, similar to D'Agostino-Pearson.
It's common in econometrics and time series analysis. Like D'Agostino-Pearson, it needs a reasonably large sample to produce reliable results. With small samples, the test isn’t too reliable. If you're working in a finance or economics context, you'll likely see this one often.
To conclude, start with Shapiro-Wilk for small samples and pair it with a Q-Q plot. Use Anderson-Darling when tail behavior matters, and D'Agostino-Pearson when you want to understand the nature of the deviation.
How to Interpret Normality Test Results
Every normality test is a hypothesis test.
The null hypothesis in any normality test is that your data is normally distributed. The test then asks: given what we see in the data, how likely is this null hypothesis to be true?
The answer comes back as a p-value:
- p > 0.05 - you don't have enough evidence to reject normality. Assume the data is normal and proceed with parametric tests
- p < 0.05 - the data differs from normality enough to be statistically detectable. Reject the normality assumption
Sounds simple, but many people go wrong here.
A low p-value doesn't tell you how non-normal your data is - only that a difference was detected. With large samples, normality tests become extremely sensitive. They'll flag deviations so small they have no real impact on your analysis.
The opposite problem also exists. With small samples, even visibly skewed data can produce p > 0.05 because the test doesn't have enough power to detect the deviation.
Statistical significance and practical significance aren't the same thing.
A p-value tells you whether a departure from normality exists. It doesn't tell you whether that departure matters for your specific analysis. Always pair your test result with a Q-Q plot - if the points follow the line closely, your data is probably normal enough, regardless of what the p-value says.
Normality Tests in Python
Python's scipy.stats module has everything you need to run normality tests in a few lines of code.
For all examples below, I'll use the same dataset - 100 samples drawn from a normal distribution - so you can run the code and follow along.
import numpy as np
from scipy import stats
np.random.seed(42)
data = np.random.normal(loc=0, scale=1, size=100)
Shapiro-Wilk test
Use shapiro() as your first check, especially with smaller datasets.
stat, p_value = stats.shapiro(data)
print(f"Statistic: {stat:.4f}, p-value: {p_value:.4f}")
This is what you get out:

Output of a Shapiro-Wilk’s test in Python
The p-value is well above 0.05, so we don't reject normality. The data looks normal - which makes sense, since we generated it from a normal distribution.
Kolmogorov-Smirnov test
kstest() compares your sample against a named distribution. For normality, pass "norm" along with the sample's mean and standard deviation.
stat, p_value = stats.kstest(data, 'norm', args=(data.mean(), data.std()))
print(f"Statistic: {stat:.4f}, p-value: {p_value:.4f}")

Output of a Kolmogorov-Smirnov test in Python
Again, p > 0.05 - no evidence against normality.
With this test in Python, always pass the mean and standard deviation explicitly via args. If you skip that, kstest() defaults to a standard normal (mean=0, std=1), which will give you unreliable results unless your data is already standardized.
D'Agostino-Pearson test
normaltest() tests for normality by checking skewness and kurtosis combined. It works best with larger samples.
stat, p_value = stats.normaltest(data)
print(f"Statistic: {stat:.4f}, p-value: {p_value:.4f}")

Output of a D'Agostino-Pearson test in Python
p > 0.05 again. The data passes all three tests here, but that's expected - I generated it to be normal. In practice, you'll often see these tests disagree, especially near the 0.05 boundary. When that happens, fall back on your Q-Q plot to make the call.
Normality Tests in R
R has built-in functions for normality testing. It requires no extra packages needed for the basics.
As with the Python examples, I'll use the same dataset throughout: 100 samples from a normal distribution.
set.seed(42)
data <- rnorm(100, mean = 0, sd = 1)
Shapiro-Wilk test
shapiro.test() is the go-to for small to medium samples. Just pass it your vector of data:
shapiro.test(data)

Output of a Shapiro-Wilk’s test in R
p > 0.05 - no evidence against normality. The W statistic ranges from 0 to 1, where values close to 1 indicate the data closely follows a normal distribution.
Kolmogorov-Smirnov test
ks.test() compares your sample against a theoretical distribution. For normality, specify "pnorm" and pass the sample mean and standard deviation.
ks.test(data, "pnorm", mean(data), sd(data))

Output of a Kolmogorov-Smirnov test in R
p > 0.05 again. This test in R has the same caveat as in Python: always pass mean(data) and sd(data). Skipping it would default to a standard normal, which skews the result unless your data is already standardized.
Q-Q plot
R's built-in qqnorm() and qqline() give you a Q-Q plot in two lines of code.
qqnorm(data, main = "Q-Q Plot")
qqline(data, col = "steelblue", lwd = 2)

Q-Q plot in R
qqnorm() plots your sample quantiles against theoretical normal quantiles. qqline() draws the reference line. Points following that line closely mean your data is behaving normally. Deviations at the ends signal tail issues worth investigating.
What to Do If Data Is Not Normal
If your data fails a normality test, you have a couple of solid options.
Transform the data
Sometimes the fix is to transform your data so it behaves normally, then run your original tests on the transformed values.
Log transformation is the most common choice. It works well on right-skewed data - think income, response times, or biological measurements that have a long tail on the right side. The function in Python is np.log(data), and the R equivalent is log(data).
Square root transformation is a milder option for moderate skew, and it's handy when your data contains zeros (since you can't take the log of zero). Use np.sqrt(data) in Python or sqrt(data) in R.
After transforming, re-run your normality test. If the transformed data passes, proceed with your parametric tests - just remember to interpret results in terms of the transformed scale.
Use non-parametric tests
If transformation doesn't work or doesn't make sense for your data, switch to non-parametric tests. These don't assume normality - they rank the data instead of working with raw values.
- Mann-Whitney U test is the non-parametric alternative to the independent samples t-test. Use it when you're comparing two groups
- Kruskal-Wallis test is the non-parametric version of one-way ANOVA. Use it when you're comparing three or more groups
Both are available in scipy.stats (mannwhitneyu() and kruskal()) and in R's base package (wilcox.test() and kruskal.test()).
Rely on large sample sizes
With large enough samples, you can often skip the normality concern.
The central limit theorem says that as your sample size grows, the sampling distribution of the mean approaches normal - regardless of how the original data is distributed. In practice, this means parametric tests tend to be reliable with large samples even when the underlying data isn't perfectly normal.
Common Mistakes When Testing for Normality
Normality testing is easy - you’ve seen that it only takes one line of code. Still, there are a couple of ways to get it wrong.
Here are some common mistakes newcomers to data science often make:
- Relying only on p-values: A p-value tells you whether a departure from normality was detected, not how large that departure is or whether it matters. Treating p > 0.05 as a green light and p < 0.05 as a red light is too blunt. Always pair your test result with a Q-Q plot
- Ignoring sample size effects: With small samples, normality tests can miss real departures and return p > 0.05 even when your data is visibly skewed. With large samples, the test becomes so sensitive it flags tiny, meaningless deviations as statistically significant. Sample size can change what p-value means
- Over-testing normality: Not every analysis needs a formal normality test. If you're doing exploratory work, a histogram and Q-Q plot are usually enough
- Misinterpreting slight deviations: Real-world data is almost never perfectly normal. A minor departure from the reference line on a Q-Q plot, or a p-value sitting just under 0.05, doesn't mean your data is far from normal. The question is whether it's normal enough for the test you're running
So to conclude, normality testing is just a single check of your data. Use it as one input among many, not as the final word.
When You Can Skip Normality Testing
Normality testing isn't always necessary. If you’re under a deadline, knowing when to skip it can save you time without affecting results.
Large datasets
When you have a large sample, the central limit theorem guarantees that the sampling distribution of the mean is approximately normal, regardless of the shape of your raw data. Parametric tests are generally reliable in this situation, so running a formal normality test adds little value.
Some statistical methods are also robust to non-normality. Techniques like linear regression tend to hold up well when sample sizes are reasonable, and violations aren't extreme. (Linear regression still assumes normality in the residuals.)
Exploratory analysis
When you're scanning data for patterns, building intuition, or deciding which variables to investigate further, a quick histogram or Q-Q plot is enough. Formal tests are for confirmatory analysis - when your conclusions need to hold up.
Remember that normality testing exists to protect you from drawing wrong conclusions. If you're in a context where a wrong conclusion doesn't carry real consequences, or where your method doesn't depend on normality, the test is optional.
Conclusion
Normality testing is all about checking if your assumptions hold well enough to trust your results.
No dataset is perfectly normal. The goal is to understand how your data behaves and choose your methods accordingly. A Q-Q plot tells you where the deviations are. A formal test informs you whether they're statistically detectable. When combined, they give you a clearer picture than either one alone.
The right test depends on your context. Use Shapiro-Wilk for small samples, Anderson-Darling when tails matter, non-parametric alternatives when normality can't be assumed. And sometimes - with large samples or robust methods - no test at all.
Do you find the entire concept of p-values confusing? Read our Hypothesis Testing Made Easy article to make sure you’re interpreting them correctly.
Normality Test FAQs
What is a normality test?
A normality test is a statistical method that checks whether your data follows a normal (Gaussian) distribution. Most common statistical tests - like t-tests, ANOVA, and linear regression - assume normality, so checking this assumption before running your analysis helps you avoid drawing incorrect conclusions.
Do I always need to test for normality?
Not always. With large samples, the central limit theorem makes parametric tests reliable regardless of the underlying distribution. For exploratory analysis, a quick histogram or Q-Q plot is usually enough - formal normality tests are most useful when you're doing confirmatory analysis, and your conclusions need to hold up.
What should I do if my data fails a normality test?
You have a couple of options. You can transform the data using a log or square root transformation, then re-test. If the transformation doesn't work, switch to non-parametric tests like Mann-Whitney U (for two groups) or Kruskal-Wallis (for three or more groups), which don't assume normality.
What's the difference between the Shapiro-Wilk and Kolmogorov-Smirnov tests?
Shapiro-Wilk is designed specifically for normality testing and works best with small to medium samples. The Kolmogorov-Smirnov test is more general - it can compare a sample against any theoretical distribution, not just normal - but it's less powerful than Shapiro-Wilk for normality testing specifically, making it more likely to miss subtle departures.
How do I interpret a Q-Q plot for normality?
A Q-Q plot compares your data's quantiles against the quantiles of a theoretical normal distribution. If the points fall close to the diagonal reference line, your data is behaving normally. Deviations at the ends of the line signal tail issues - an S-shaped curve points to skewness, while points curving away from the line at both ends suggest heavier or lighter tails than a normal distribution would have.


