Track
Have you ever run a t-test, got a weird p-value, and later found out your data wasn't even close to normally distributed?
This has happened to everyone at some point in time. The problem with t-test is that it assumes your data follows a normal distribution. When it doesn't, the results can be misleading. Skewed data and small sample sizes all violate that normality assumption. And real-world data rarely behaves the way textbooks say it should.
The Mann-Whitney U test is here to fix the problem. It's a nonparametric alternative to the t-test that compares two groups based on ranks rather than means, so it doesn't care about your distribution's shape.
In this article, I'll cover what the Mann-Whitney U test is, when to use it, how the math works, and how to run and interpret it in both Python and R.
But what exactly is a t-test? If you have that question, read our Introduction to Python T-Tests blog post - it will answer all of your questions.
What Is the Mann-Whitney U Test?
The Mann-Whitney U test is a nonparametric statistical test used to compare two independent groups.
Unlike the t-test, it doesn't assume your data follows a normal distribution. It compares the distributions of two groups by converting raw values into ranks and analyzing those. That makes it a good choice when your data is skewed, has outliers, or just doesn't meet the normality requirement in any other way.
You'll also see it called the Wilcoxon rank-sum test. These are synonyms for all intents and purposes.
When to Use the Mann-Whitney U Test
The Mann-Whitney U test needs a specific set of conditions. You should only use it when all of these apply:
- Two independent groups: The samples don't overlap, and one group's values don't influence the other's
- Ordinal or continuous data: Think test scores, response times, or any measured value
- Non-normal distribution: Your data is skewed, has heavy tails, or you can't confirm normality with a small sample
- Small sample sizes: When you don't have enough data to rely on the central limit theorem
Let’s go through an example.
Say you have two classes taught with different methods and you want to know which one produced better exam results. You plot the scores and see they're not normally distributed - one class has a few outliers pulling the distribution to the right. The t-test compares group means, so those outliers pull the mean up and make one class look better than it actually is.
That skewed mean goes into the t-test calculation, and the p-value you get back doesn't reflect the difference between the groups. The Mann-Whitney U test doesn’t show that problem because it works with ranks instead of raw scores. A single outlier can only ever be the highest-ranked value, so it can't distort the result the way it would a mean.
It's also a go-to when you're working with ordinal data, like survey responses on a 1-5 scale. Those values aren't truly continuous, so computing a mean doesn't make much sense.
Mann-Whitney U Test Formula
The test produces two U statistics, one for each group. Here's the formula:

Mann-Whitney U test formula
Where:
-
n1andn2are the sample sizes of group 1 and group 2 -
R1andR2are the rank sums for each group - the sum of all ranks assigned to each group's observations
The rank sum is calculated by combining all values from both groups, sorting them from lowest to highest, and assigning a rank to each value. The smallest value gets rank 1, the next gets rank 2, and so on. Then you separately add up the ranks belonging to each group.
The test statistic is the smaller of U1 and U2. You then compare it against a critical value or use it to compute a p-value.
The good news is you don't need to calculate this by hand. Python and R both do it for you, which I'll show you shortly.
Assumptions of the Mann-Whitney U Test
The Mann-Whitney U test is more flexible than the t-test, but it still has three assumptions you need to respect:
- Independent samples: The two groups don't influence each other. Observations in one group have no relationship to observations in the other
- Ordinal or continuous data: Your data needs to have a natural order - you can say one value is higher or lower than another
- Similar distribution shapes: If you want to interpret the results as a comparison of medians, both groups should have distributions with roughly the same shape. If the shapes are different, the test still works, but you're comparing mean ranks rather than medians
The third assumption confuses people the most.
The Mann-Whitney U test is often described as a test for medians, but that's only true when the two distributions have a similar shape. If they don't, the result tells you something more general - whether values in one group tend to be higher than values in the other.
Mann-Whitney U Test in Python
Python's scipy.stats module has a function for the Mann-Whitney U test. Here's a simple example using exam scores from two classes.
from scipy.stats import mannwhitneyu
class_a = [72, 85, 90, 65, 78, 88, 95, 70, 83, 76]
class_b = [60, 55, 74, 68, 80, 58, 63, 71, 66, 59]
stat, p_value = mannwhitneyu(class_a, class_b, alternative="two-sided")
print(f"U statistic: {stat}")
print(f"P-value: {p_value:.4f}")

Mann-Whitney U test in Python
The alternative="two-sided" argument tells the test you're checking whether the two groups differ in either direction. You're not assuming one group scores higher than the other upfront. If you had a directional hypothesis, you'd use "less" or "greater" instead.
The p-value here is 0.0046, which is below the standard threshold of 0.05. That means you can reject the null hypothesis, as there's a statistically significant difference between the two classes' score distributions.
The U statistic on its own doesn't tell you much without context. You can focus on the p-value to decide whether the difference is statistically significant, and look at the raw data or medians to understand the direction of that difference.
Mann-Whitney U Test in R
R runs the Mann-Whitney U test through the wilcox.test() function. I’ll use the same exam score example from before.
class_a <- c(72, 85, 90, 65, 78, 88, 95, 70, 83, 76)
class_b <- c(60, 55, 74, 68, 80, 58, 63, 71, 66, 59)
wilcox.test(class_a, class_b, alternative = "two.sided")

Mann-Whitney U test in R
The W statistic is the same as the U statistic - R just labels it differently. The interpretation is the same as in Python: a p-value of 0.0029 is below 0.05, so there's a statistically significant difference between the two groups.
You may also see a warning about ties in your data.
That happens when two or more values are identical across both groups, which affects how ranks are assigned. R handles this for you, but if you have a lot of ties, it's worth checking whether your data meets the test's assumptions.
How to Interpret Mann-Whitney U Test Results
The null hypothesis of the Mann-Whitney U test is that the two groups come from the same distribution - in other words, that there's no difference between them. Your task is to find evidence against that.
The p-value is how you do that:
- p < 0.05: You reject the null hypothesis. The two groups are distributed differently, and the difference is statistically significant
- p >= 0.05: You don't have enough evidence to reject the null hypothesis. That doesn't mean the groups are identical, it just means the data doesn't show a clear difference
Just remember the Mann-Whitney U test compares distributions. A significant result tells you that values in one group tend to rank higher than values in the other - not that the average is higher. If you want to describe the direction of the difference, look at the medians of each group, not the means.
Mann-Whitney U Test vs t-Test
These two tests solve the same problem (comparing two groups) but they do it differently, and choosing the wrong one will affect your results.
t-test
The t-test compares the means of two groups. It's built on the assumption that your data follows a normal distribution, and when that is true, it's a good test.
The problem is that assumption. If your data is skewed or comes from a small sample where normality is hard to confirm, the t-test's results can become unreliable. The mean is pulled by extreme values, and the p-value shows that.
Use the t-test when:
- Your data is normally distributed
- You have a large enough sample size
- You're working with continuous data without heavy skew or outliers
Mann-Whitney U test
The Mann-Whitney U test compares distributions rather than means. It ranks all values from both groups together and checks whether one group consistently ranks higher than the other. Because it works with ranks, outliers and skew don't distort the result the same way.
When your data actually is normally distributed, the t-test will detect differences more reliably. The Mann-Whitney U test is more flexible, but you give up some sensitivity.
Use the Mann-Whitney U test when:
- Your data isn't normally distributed
- You're working with ordinal data
- You have a small sample size and can't confirm normality
- Outliers are present and you can't remove them
Here's a quick comparison of both:

t-test compared to Mann-Whitney U test
When in doubt, check your distribution first. If it's somewhat normal, go with the t-test. If it's not, the Mann-Whitney U test is the safer choice.
Common Mistakes with the Mann-Whitney U Test
Most mistakes with this test come down to not understanding what it actually does. Here are the ones that show up most often.
Assuming it compares means
This is the most common one. The Mann-Whitney U test compares distributions, not means. A significant result tells you that values in one group tend to rank higher - not that the average is higher. If you need to describe the difference, report the medians, not the means.
Ignoring distribution shape differences
If the two groups have different distribution shapes - one is skewed right, the other is symmetric - you can't interpret the result as a comparison of medians. The test still runs, but the output shows a difference in the overall distributions, not a shift in the center. Check your distributions before drawing conclusions about medians.
Misinterpreting p-values
A p-value below 0.05 means the difference is statistically significant. It doesn't tell you how large the difference is or whether it matters. A very large sample can produce a significant p-value even when the actual difference between groups is tiny. If effect size matters in your analysis, calculate it separately.
Using it for paired data
The Mann-Whitney U test is for two independent groups. If your data is paired - the same subjects measured twice, or matched pairs - you need the Wilcoxon signed-rank test instead.
When You Should Not Use the Mann-Whitney U Test
The Mann-Whitney U test is not the right tool for every situation. Here's when you should go with something else.
Your data is paired
If the same subjects appear in both groups - before and after measurements, or matched pairs - the two samples aren't independent. The Mann-Whitney U test assumes they are, so using it here ignores the relationship between observations and gives you unreliable results. Use the Wilcoxon signed-rank test instead.
You have more than two groups
The Mann-Whitney U test only compares two groups at a time. If you're comparing three or more groups, use the Kruskal-Wallis test, which is the nonparametric equivalent of a one-way ANOVA and can handle multiple groups.
You have a large sample with normal data
The Mann-Whitney U test's main advantage is that it doesn't assume normality. If your data is normally distributed and your sample is large enough to confirm that, the t-test is the better choice. It has more statistical power in that situation, which means it's more likely to detect a real difference when one exists.
Conclusion
The Mann-Whitney U test is a great solution when your data isn't normally distributed, so the t-test isn't a good fit.
It works with ranks instead of raw values, so it avoids the assumptions that make parametric tests unreliable on skewed or small-sample data. That makes it a good test for real-world analysis, where data rarely behaves the way you'd like it to.
The bigger lesson here is test selection. No single test works for every dataset. You should always check your data first - its distribution, structure, and sample size - and let those characteristics guide your choice. The right is the one that fits your data.
If you’re new to statistics or really want to dive deep into the subject, our Statistician in R track will help you get job-ready in just 52 hours of materials.
FAQs
What is the Mann-Whitney U test used for?
The Mann-Whitney U test is used to compare two independent groups when you can't assume the data follows a normal distribution. It ranks all values from both groups together and checks whether one group consistently ranks higher than the other. It works with both ordinal and continuous data.
How is the Mann-Whitney U test different from the t-test?
The t-test compares the means of two groups and assumes normal distribution. The Mann-Whitney U test compares distributions using ranks, so it doesn't make that assumption. When your data is skewed or comes from a small sample, the Mann-Whitney U test is the safer choice.
When should I use the Mann-Whitney U test?
Use it when you have two independent groups, your data is ordinal or continuous, and you can't confirm normality. It's also a good fit when your sample size is small and outliers are present. If your data is normally distributed and your sample is large, the t-test will generally give you better results.
What does the p-value tell you in a Mann-Whitney U test?
A p-value below 0.05 means there's a statistically significant difference between the two groups' distributions. It doesn't tell you how large that difference is or whether it's meaningful in practice. For that, you'd need to calculate effect size separately and look at the medians of each group.
Can I use the Mann-Whitney U test for paired data?
No. The Mann-Whitney U test assumes the two groups are independent, meaning one group's values don't influence the other's. If your data is paired - think before and after measurements on the same subjects - use the Wilcoxon signed-rank test instead. Using the Mann-Whitney U test on paired data ignores the relationship between observations and produces unreliable results.
