Skip to main content

Kruskal-Wallis Test: Comparing Multiple Groups Without Normality

A practical guide to the Kruskal-Wallis test - what it is, how it works, when to use it over ANOVA, and how to run and interpret it in Python and R.
May 4, 2026  · 9 min read

Comparing multiple groups is easy when your data follows a normal distribution. The problem is, most real-world data doesn’t.

If ANOVA is your default test, you’ll get to the wrong conclusions, as it assumes your data follows a normal distribution. When it doesn't - think skewed data or small samples - you need a different approach.

The Kruskal-Wallis test is that different approach. It’s a nonparametric alternative to ANOVA, and it works on ranked data instead of raw values, so a normal distribution isn't a requirement.

In this article, I'll cover the concept, the math behind it, how to run it in Python and R, and how to interpret the results.

What Is the Kruskal-Wallis Test?

The Kruskal-Wallis test is a nonparametric method for comparing three or more independent groups. It converts all observations into ranks and compares those ranks across groups instead of working with raw values.

You can think of it as an extension of the Mann-Whitney U test, which I've also written about.

The Mann-Whitney U does the same rank-based comparison, but only for two groups. The Kruskal-Wallis test scales it to three or more, so when you have multiple groups and can't use ANOVA, this is what you should use.

Because it works on ranks rather than raw values, it doesn't assume your data follows any particular distribution. That's what makes it useful with real-world data, as it never tends to follow one distribution type perfectly.

When to Use the Kruskal-Wallis Test

The Kruskal-Wallis test is a great fit when you're dealing with:

  • Three or more independent groups you want to compare
  • Ordinal or continuous data such as Likert scale ratings or measurement data
  • Non-normal distributions through skewed data, outliers, small samples, or anything ANOVA can't handle
  • Small sample sizes where normality is hard to verify

Here’s a simple example.

Imagine you want to compare exam scores across three different classes. The scores are skewed and the samples are small, so ANOVA isn’t a good choice. The Kruskal-Wallis test doesn't need normality, so it works here. It'll tell you whether at least one class scored differently from the others without making assumptions your data can't support.

Kruskal-Wallis Test vs. ANOVA

Both tests compare groups, but they do it differently.

ANOVA compares group means and assumes your data is normally distributed with roughly equal variances. When those assumptions are true, it's the better choice - it's more statistically powerful and the results are easier to interpret.

The Kruskal-Wallis test compares group distributions using ranks. It doesn't care about normality or equal variances. That makes it more flexible, but you lose some statistical power in the process.

Here's a quick comparison table:

ANOVA compared to Kruskal-Wallis test

ANOVA compared to Kruskal-Wallis test

If your data is normally distributed, use ANOVA. If it isn't - or you can't verify that it is - use Kruskal-Wallis.

Kruskal-Wallis Test Formula

The Kruskal-Wallis test boils down to a single test statistic, H. Here's the formula:

Kruskal-Wallis formula

Kruskal-Wallis formula

Here’s the explanation of the components:

  • N - total number of observations across all groups

  • k - number of groups

  • n_i - number of observations in group i

  • R_i - sum of ranks assigned to group i

The formula measures how much the rank sums of each group deviate from what you'd expect if all groups were identical. A large H means the groups are different, and a small H means they are not that different.

Once you have H, you compare it against a chi-square distribution with k - 1 degrees of freedom to get a p-value.

How the Kruskal-Wallis Test Works

There are four steps needed to perform the Kruskal-Wallis test:

  1. Combine all groups: Take all observations from every group and combine them into a single dataset
  2. Rank all observations: Sort the combined data from smallest to largest and assign ranks. The smallest value gets rank 1, the next gets rank 2, and so on. If two values are equal, they share the average of the ranks they would have occupied.
  3. Compute rank sums: Split the ranks into their original groups. Add up the ranks for each group. These are your rank sums - R_i in the formula
  4. Calculate the test statistic: Add the rank sums into the H formula. If the groups are similar, their rank sums will be close to each other and H will be small. If one group consistently gets higher or lower ranks, H grows larger

And that's it!

You can see that the test doesn't care about the actual values, but instead, only where they are relative to everything else.

Kruskal-Wallis Test in Python

Python's scipy library has a built-in function for the Kruskal-Wallis test, meaning you don’t have to implement the formula by hand. Let's go through an example.

Say you're comparing exam scores across three classes. Here's how you'd run the test:

from scipy import stats

# Exam scores
class_a = [78, 85, 90, 72, 88]
class_b = [65, 70, 68, 74, 60]
class_c = [88, 92, 95, 85, 91]

# Run the test
statistic, p_value = stats.kruskal(class_a, class_b, class_c)

print(f"H statistic: {statistic:.4f}")
print(f"P-value: {p_value:.4f}")

Python output

Python output

The p-value is below 0.05, which means at least one class scored differently from the others. Just keep in mind the test doesn't tell you which one - you'll need a post hoc test for that, which I'll cover in the next section.

Kruskal-Wallis Test in R

Just like Python, R has a built-in function for this test. Let's use the same exam score scenario.

# Exam scores
class_a <- c(78, 85, 90, 72, 88)
class_b <- c(65, 70, 68, 74, 60)
class_c <- c(88, 92, 95, 85, 91)

# Combine
scores <- c(class_a, class_b, class_c)
groups <- factor(rep(c("A", "B", "C"), each = 5))

# Run the test
kruskal.test(scores ~ groups)

R output

R output

The output is the same as what I got in Python - same H statistic, same p-value. With p < 0.05, you'd reject the null hypothesis and conclude that at least one group differs.

How to Interpret Kruskal-Wallis Results

The null hypothesis of the Kruskal-Wallis test is that all groups have the same distribution. The p-value tells you whether to reject it. Here’s how to interpret it:

  • p < 0.05: At least one group differs from the others, so reject the null hypothesis
  • p >= 0.05: There is no strong evidence that the groups differ, so don't reject the null hypothesis

The 0.05 threshold is a convention. Depending on your field or the stakes of your analysis, you might use a stricter threshold like 0.01 or a looser one like 0.10.

Keep in mind this test won’t tell you which group is different. A significant result just means the groups aren't all the same. You know something is going on, but not where. To find out which pairs are driving the difference, you need a post hoc test.

Post Hoc Tests After Kruskal-Wallis

The test tells you that at least one group differs, but not which group is actually different. If you have three groups and p < 0.05, it could be A versus B, A versus C, B versus C, or some combination. You need to perform a post hoc test to get these pairwise comparisons.

Dunn's test is the most common choice. It runs pairwise comparisons between all groups and adjusts the p-values to account for multiple comparisons - without that adjustment, you'd inflate the chance of a false positive. The more comparisons you run, the higher the risk of finding a "significant" result by chance alone.

Dunn's test in Python

You'll need the scikit_posthocs library for this. If you don't have it, install it with pip install scikit-posthocs.

From there, the calculation is simple:

import scikit_posthocs as sp
import pandas as pd

# Same exam scores as before
class_a = [78, 85, 90, 72, 88]
class_b = [65, 70, 68, 74, 60]
class_c = [88, 92, 95, 85, 91]

# Combine
scores = class_a + class_b + class_c
groups = ["A"] * 5 + ["B"] * 5 + ["C"] * 5

df = pd.DataFrame({"score": scores, "group": groups})

# Run the test
result = sp.posthoc_dunn(df, val_col="score", group_col="group", p_adjust="bonferroni")
print(result)

Dunn’s test in Python

Dunn’s test in Python

Each cell shows the adjusted p-value for that pair. Here, only B versus C (p = 0.004) crosses the 0.05 threshold, so those two groups differ. A versus B (p = 0.167) and A versus C (p = 0.607) don't, which means class A isn't statistically different from either of the other two classes.

Dunn's test in R

To start, install the library if needed with the install.packages("dunn.test") command:

library(dunn.test)

# Same exam scores as before
class_a <- c(78, 85, 90, 72, 88)
class_b <- c(65, 70, 68, 74, 60)
class_c <- c(88, 92, 95, 85, 91)

scores <- c(class_a, class_b, class_c)
groups <- factor(rep(c("A", "B", "C"), each = 5))

# Run the test
dunn.test(scores, groups, method = "bonferroni")

Dunn’s test in R

Dunn’s test in R

The results match Python, as you would expect. Only B versus C is significant, while A versus B and A versus C aren't. Class B and class C are the ones behind the difference detected by the Kruskal-Wallis test.

Assumptions of the Kruskal-Wallis Test

The Kruskal-Wallis test is more flexible than ANOVA, but it still has three assumptions you need to check before running it:

  • Independent samples: Observations in one group don't influence observations in another. If your data is paired or repeated measures, this test isn't the right fit
  • Ordinal or continuous data: The test needs data you can rank. Nominal categories (like colors or labels) can't be ranked, so they won't work here
  • Similar distribution shapes: If you want to interpret the results as a comparison of medians rather than just distributions, the groups need to have roughly the same shape. If the shapes differ a lot, you can still compare distributions, but the median interpretation doesn't hold

If you violate the first two assumptions, the test results won’t be valid. The third assumption is somewhat softer, as it affects how you interpret the results, not whether you can run the test at all.

When You Should Not Use the Kruskal-Wallis Test

There are three cases where a different test would be a better fit:

  • Your data is paired or repeated measures: If the same subjects appear across groups, use the Friedman test instead. It's the nonparametric equivalent designed for dependent samples. Using Kruskal-Wallis on paired data ignores the relationship between observations and can lead to wrong conclusions
  • Your data meets ANOVA's assumptions: If your data is normally distributed with roughly equal variances, ANOVA is the better choice. It's more statistically powerful, which means it's better at detecting real differences when they exist
  • Your sample sizes are large: With large samples, parametric methods tend to work well even when the data isn't perfectly normal. The central limit theorem does its thing, and ANOVA will give you more reliable results than the rank-based approach. If you're working with hundreds or thousands of observations per group, Kruskal-Wallis isn’t the test for you

Conclusion

The Kruskal-Wallis test compares three or more independent groups when your data doesn't follow the normal distribution required by tests like ANOVA. This is possible because it works on ranks instead of raw values.

That said, it's not a replacement for ANOVA. If your data is normal, ANOVA is the better test because it carries more statistical significance. On the other hand, if your data are paired, use the Friedman test. As always, the right test depends on your data.

When the conditions are just right, the Kruskal-Wallis test is a reliable and straightforward choice. You need to run it, check the p-value, and follow up with Dunn's test if you need to know which groups are behind the difference.

Is your knowledge of statistics a bit rusty? Take our Introduction to Statistics course and get back on track in a single afternoon.


Dario Radečić's photo
Author
Dario Radečić
LinkedIn
Senior Data Scientist based in Croatia. Top Tech Writer with over 700 articles published, generating more than 10M views. Book Author of Machine Learning Automation with TPOT.

Kruskal-Wallis Test FAQs

What is the Kruskal-Wallis test used for?

The Kruskal-Wallis test is used to compare three or more independent groups when you can't assume your data follows a normal distribution. It's a nonparametric alternative to ANOVA that works on ranked data instead of raw values. You'll find it useful in situations where distributions are skewed or data is ordinal.

What does a significant Kruskal-Wallis result mean?

A significant result - typically p < 0.05 - means at least one group differs from the others. It doesn't tell you which groups are different, just that they're not all the same. To find out which pairs are behind the difference, you need to follow up with a post hoc test like Dunn's test.

What are the assumptions of the Kruskal-Wallis test?

The test requires independent samples, meaning observations in one group don't influence observations in another. Your data needs to be ordinal or continuous - something you can rank. If you want to interpret results as a comparison of medians, the groups should also have similar distribution shapes.

What is the difference between the Kruskal-Wallis test and the Mann-Whitney U test?

The Mann-Whitney U test compares two independent groups, while the Kruskal-Wallis test extends that approach to three or more groups. Both work on ranked data and don't assume normality. If you only have two groups, Mann-Whitney U is the right choice - Kruskal-Wallis is its multi-group equivalent.

When should you use Dunn's test after Kruskal-Wallis?

Run Dunn's test when your Kruskal-Wallis result is significant and you need to know which specific pairs of groups differ. It performs pairwise comparisons between all groups and adjusts the p-values to reduce the chance of false positives. In Python, scikit_posthocs.posthoc_dunn() does this, and in R, the dunn.test package gives you the same functionality.

Topics

Learn with DataCamp

Course

Introduction to Statistics in R

4 hr
128.6K
Grow your statistical skills and learn how to collect, analyze, and draw accurate conclusions from data.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

Tutorial

Normality Test: How to Check If Your Data Is Normally Distributed

Learn what a normality test is, why it matters, and how to use common tests like Shapiro-Wilk, Kolmogorov-Smirnov, and visual methods to check your data + examples in Python and R.
Dario Radečić's photo

Dario Radečić

Tutorial

ANOVA Test: An In-Depth Guide with Examples

Discover how to use the ANOVA test to compare multiple groups means with clear examples, real-world applications, and practical tips for data analysis.
Arunn Thevapalan's photo

Arunn Thevapalan

Tutorial

Mann-Whitney U Test: Nonparametric Alternative to the t-Test

The Mann-Whitney U test is a rank-based nonparametric test for comparing two independent groups when data doesn't meet the normality assumption required by the t-test.
Dario Radečić's photo

Dario Radečić

Tutorial

An Introduction to Python T-Tests

Learn how to perform t-tests in Python with this tutorial. Understand the different types of t-tests - one-sample test, two-sample test, paired t-test, and Welch’s test, and when to use them.
Vidhi Chugh's photo

Vidhi Chugh

Tutorial

A Comprehensive Guide to Using ANOVA in Excel

Learn the simplified process of conducting ANOVA in Excel, and interpreting the results with clear, step-by-step instructions.
Arunn Thevapalan's photo

Arunn Thevapalan

Tutorial

T-tests in R Tutorial: Learn How to Conduct T-Tests

Determine if there is a significant difference between the means of the two groups using t.test() in R.
Abid Ali Awan's photo

Abid Ali Awan

See MoreSee More