Mean vs. Median: Knowing the Difference

Explore the differences between mean and median, learn their applications in data analysis, and know how to choose the right measure for different scenarios.

Jan 31, 2025 · 8 min read

When interpreting data, choosing the right measure of central tendency can make or break your analysis. Among the most common metrics are the mean and median, which are two seemingly straightforward concepts that carry profound implications in data interpretation. While the mean gives us the arithmetic average, the median is the central point in a sorted set of values, such that half the observations lie on either side. But which one is more reliable? The answer often depends on your data's distribution, the presence of outliers, and the story you're trying to tell.

In this article, I will break down the differences between mean and median, their strengths and weaknesses, and how to choose the right one for different scenarios. I will also explore how skewed distributions and outliers affect these measures, providing practical examples and visuals to help you understand these fundamental concepts. We'll also dip a toe into more advanced ideas.

Mean and Median Definitions

To fully understand the differences between the mean and the median, let us look at each of these measures and highlight their key properties.

What is the mean?

The mean can be viewed as the “balance point” (or center of mass) of the data. It considers all data points in a dataset and provides a single value that represents the average. More exactly, the mean is calculated by summing all the values in a dataset and then dividing by the number of values.

What is the median?

The median is the middle value when the data is sorted. Unlike the mean, it is more robust against outliers, providing a better measure of central tendency for skewed data.

What about the mode?

The mode is another measure of central tendency, representing the most frequently occuring value in a dataset. For example, in this series:

1, 3, 3, 6, 8, 9

the mode is 3 because it appears twice.

How to Calculate the Mean and Median

Reading a definition is one thing, but calculating is another. In this section, I will break down the steps for calculating each measure and highlight their computational differences.

How to find the mean

The mean is the arithmetic average of a dataset and is calculated as follows:

Sum the Values: Add up all the numbers in your dataset.
Divide by the Total Number of Values: Take the total sum and divide it by the count of values.

Here is the process represented as a general equation:

How to find the mean. Image by Author

For an example, consider a dataset of exam scores:

78, 85, 92, 88, 70

Step 1 (Sum): 78 + 85 + 92 + 88 + 70 = 413
Step 2 (Divide): 413 ÷ 5 = 82.6

The mean score is 82.6.

How to find the median

The median is the middle value of a dataset when arranged in ascending order. Here is how to find it:

Sort the Data: Arrange the values from smallest to largest.
Identify the Middle Value: If the dataset contains an odd number of values, the median is the value in the middle; if the dataset contains an even number of values, the median is the average of the two middle values.

And here are those steps represented as equations:

Median formula. Image by Author

I also created a visual to highlight the process.

How to find the median. Image by Author

Here’s an example dataset with an odd number of values:

70, 78, 85, 88, 92

Step 1 (Sort): Already sorted.
Step 2 (Middle Value): The third value is 85.

The median is 85.

Here’s another example but with an even number of values:

70, 78, 85, 88

Step 1 (Sort): Already sorted.
Step 2 (Average of middle values): (78 + 85) ÷ 2 = 81.5

The median is 81.5.

Why the Difference Matters: Outliers and Skew

While both the mean and median describe the center of a dataset, their behavior diverges significantly in the presence of outiers and skewed distributions. Understanding this difference is very important for accurately interpreting data and avoiding misleading conclusions.

Impact of outliers

Outliers are values that are significantly higher or lower than the rest of the data. They can heavily influence the mean but have little to no effect on the median.

Let’s consider a dataset of monthly incomes (in thousands):

3, 3.5, 4, 4.5, 5, 6, 50

The mean income here is 10.85k, which is heavily skewed by the extreme value of 50k.

On the other hand, the median value is 4.5k, which is, I would argue, a much more typical representation of income for this group.

Skewed distributions

The mean and median also differ in their representation of data in skewed distributions (datasets that are not symmetrical).

For example, in right-skewed distributions (e.g., income or housing prices), most values are clustered at the lower end, with a few extreme values pulling the tail to the right.

Mean: Shifts toward the tail, resulting in a value higher than the median.
Median: Remains closer to the cluster of typical values, better reflecting the “typical” case.

Consider incomes:

30k, 35k, 40k, 45k, 50k, 100k, 200k

Mean: 71.4k (pulled upward by 100k and 200k).
Median: 45k (closer to the majority of incomes).

Why this matters

In skewed data: The median is often more representative of a “typical” data point because it is not pulled by extreme values.
In symmetrical data: The mean and median will be nearly identical, so either can be used as a measure of central tendency.

One thing you should take-away from this is that it’s important to always examine your data’s distribution before deciding whether to use the mean or median. Tools like histograms and box plots can help visualize skewness and identify outliers. We’ll cover these later on. Also, I want to say that examining the difference between the mean and median is one way of assessing skewness.

Choosing Mean or Median in Different Scenarios

When analyzing data, deciding whether to use the mean or median depends on the characteristics of your dataset and the insights you are trying to extract. Below is a quick reference table to guide your choice:

Use the Mean When	Use the Median When
The data distribution is approximately normal (symmetrical).	The data is highly skewed (e.g., income, property values).
Outliers are minimal or irrelevant to the analysis.	Outliers are present and could distort the results if included.
You need a measure that is sensitive to every data point, such as in predictive modeling or when calculating totals.	You want to reflect the “typical” value rather than the “mathematical center” of the dataset.

Here’s a practical tip that will really help you: Always start with a visual analysis of your data (e.g., a histogram or box plot) to check for symmetry, skewness, and the presence of outliers. This will help you decide whether the mean or median is a better fit for your scenario.

Visualizing Mean vs. Median

Visualizations are powerful tools for understanding the behavior of the mean and median in different datasets. They can clearly demonstrate how these measures respond to outliers and skewed distributions, helping to inform better data-driven decisions.

bar chart example

Imagine a small dataset of incomes in thousands:

30, 35, 40, 45, 50, 55, 1000

The following bar chart demonstrates how a single extreme value can drastically affect the mean, while leaving the median relatively stable. In this case, most data points cluster between 30 and 55, but the presence of an outlier (1000) pulls the mean upward.

Bar chart showing effect of an outlier on mean vs. median. Image by Author

histogram example

In a right-skewed distribution (such as incomes or housing prices), the mean is often pulled toward the long tail of high values, while the median remains closer to the “typical” data point. This makes the median a better measure of central tendency in such cases.

The histogram below shows a simulated income distribution where the mean (red dashed line) is significantly larger than the median (green dashed line) due to the skew.

Histogram showing a right-skewed distribution. Image by Author

You can notice how the right skew stretches the tail, creating a clear difference between the mean and the median.

box plot example

A box plot is an excellent way to visualize the impact of outliers on the median. Below, we compare two groups: one with outliers and one without. The median (vertical line inside the box) remains stable even with the presence of extreme values, but the overall range of the data is heavily impacted by the outlier.

Box plot showing effect of outliers on median. Image by Author

These visualizations highlight how the mean and median respond to different data characteristics, providing clarity on when to use each measure. Whether analyzing skewed data, outlier-prone datasets, or comparing groups, visual aids like these can make complex relationships much easier to grasp.

Some More Advanced Ideas

Let's now look at some more advanced ideas if you are curious to learn more.

Mean vs. median imputation

Now, if you are a data scientist and you need to fill in gaps in your data, you may have to choose an imputation method. You might now be wondering, what is the practical difference between mean vs. median imputation?

As you might guess, mean imputation replaces missing values with the average of the available data, which, as we have said, can be skewed by extreme values. Median imputation, on the other hand, replaces missing values with the middle value of the dataset.

A useful rule of thumb is to look at the distribution of your data. If your data distribution were skewed with many missing values, and you had used mean imputation, then you might have altered the distribution of your data!

Mean vs. median: parametric or non-parametric?

In many parametric methods, the mean (and variance) are central parameters. For example, a simple linear regression model assumes errors are normally distributed around a mean. When your data meet the normality assumption, the sample mean is a natural estimator and fits well within parametric frameworks.

Now, the median has a non-prametric orientation, and is actually probably I would say the quintessential non-parametric measure of central tendency. Many rank-based tests like the Mann–Whitney effectively compare medians (or distributions) rather than means. So, if your data show strong skew or contain outliers, focusing on the median aligns more naturally with non-parametric statistics.

All this is to say that understanding the distinction between the mean vs. median is not just about describing data correctly, it’s also important in hypothesis testing.

Mean vs. median stability testing

When deciding whether to use a mean or a median, one key question is how stable our statistics are for a given dataset. Bootstrapping is one option that would allow us to empirically estimate the sampling distribution of both the mean and the median by repeatedly resampling (with replacement) from the original data.

You could highlight the differences in mean and median stability empirically. You could introduce a few outliers into a dataset and then re-run a bootstrap procedure, thus letting you visually show how the mean’s distribution shifts more dramatically than that of the median. Also, bootstrapping can make it concrete by showing how large or small your confidence intervals might be in realistic scenarios. Read our tutorial on applying bootstrap methods to learn more.

Mean vs. median as optimization problems

Let me now provide an alternate but equally true definition: The mean is the value that minimizes the sum of squared deviations from the data, whereas the median is the value that minimizes the sum of absolute deviations.

Take a look at this equation:

If you take the derivative of this equation with respect to $m$ , set it to zero, and solve, you will find that the minimizing value is simply the arithmetic mean. This matters because in many statistical methods, like ols regression, we minimize squared errors for mathematical convenience and to conform to assumptions of normally distributed errors.

Now consider a different idea: Instead of squaring each deviation, we measure the absolute error between m and each data point:

Here we want to find m that minimizes this total absolute deviation. It turns out (by analyzing the derivative of the absolute loss, or by a geometric argument) that the solution is the median of the dataset.

Intuitively, if $m$ is to the left of the median, there are more data points on the right pulling it to move over. Only the median is where the pull from left and right balances out, minimizing total absolute distance.

Mean vs. median computational complexity

Finally, I'll say the mean is computationally simpler at scale. What this means is that you can compute it incrementally as data streams in, without needing to sort.

Median often requires sorting. Sorting a large dataset can be computationally expensive, especially with millions of values. For very large datasets, approximate algorithms (like streaming or quantile-based algorithms) can be used to estimate the median more efficiently. Our new Concepts in Computer Science course is a great resource for learning about these things.

Next Steps

As you have seen, the mean is the arithmetic average of a dataset, which makes it sensitive to extreme values, while the median represents the middle value in an ordered dataset. The right choie can make all the difference but, this said, in real-world analyses, it is often best to actually report both the mean and median alongside additional statistics like mode, standard deviation, and percentiles. This is the best way because it provides a comprehensive picture.

If you’re eager to explore deeper into statistical concepts, there are several areas worth focusing on. Start by reading up on more advanced variations of the mean, such as the trimmed mean, geometric mean, and weighted mean, which each have their purpose. I would also take our technology-agnostic Introduction to Statistics course.

Then, to really become more of an expert, you will want to choose and master a tool. Our Introduction to Statistics in R course, and Statistician in R career track are both very informative starting points if you want to use R, which is a popular language for data science and statistics. If you prefer working with spreadsheets and a programming language like Python, our Introduction to Statistics in Google Sheets course and Introduction to Statistics in Python course provides a hands-on approach to statistical analysis using formulas and powerful libraries.

Author

Samuel Shaibu

What is the main difference between the mean and median?

When should I use the median instead of the mean?

Can the mean and median be the same?

Are there situations where neither mean nor median is sufficient?

Why is the mean more affected by outliers than the median?

To answer this question, consider how the mean is calculated: The mean is the sum of all data values divided by the number of observations. An outlier (an extremely high or low value) heavily influences that sum, pulling the mean away from what might be considered a typical value.

Now consider how the median is calculated: The median is the middle value in a sorted dataset. It depends only on the ordering of the data—not on how large or small the individual points are. A single outlier doesn’t shift the position of the middle value in the sorted list and therefore barely affects the median.

How do you think about choosing between the mean and median?

Topics

Data Analysis

Data Science

Learn with DataCamp

Course

Exploratory Data Analysis in R

4 hr

113.6K

Learn how to use graphical and numerical techniques to begin uncovering the structure of your data.

See Details

Start Course

Course

Trend Analysis in Power BI

3 hr

32.2K

Enhance your reports with trend analysis techniques such as time series, decomposition trees, and key influencers.

See Details

Start Course

Course

Data Analysis in Excel

3 hr

113.4K

Learn how to analyze data with PivotTables and intermediate logical functions before moving on to tools such as what-if analysis and forecasting.

See Details

Start Course

blog

Correlation vs. Causation: Understanding the Difference in Data Analysis

Learn the critical difference between correlation and causation in data analysis. Understand real-world examples and avoid common pitfalls in interpreting data.

Richie Cotton

8 min

Tutorial

Arithmetic Mean: A Foundational Tool for Data Analysis

Explore the arithmetic mean's role in data analysis. Learn its formula, applications, and how it compares to other kinds of means and other statistical measures, and understand when each is most useful.

Vinod Chugani

Tutorial

Normalization vs. Standardization: How to Know the Difference

Discover the key differences, applications, and implementation of normalization and standardization in data preprocessing for machine learning.

Samuel Shaibu

Tutorial

Harmonic Mean Explained: A Guide to Rates and Ratios

Discover how the harmonic mean handles rates and ratios in data science and finance. Learn its calculation methods and when to use it for more accurate analysis.

Vinod Chugani

Tutorial

Sample Standard Deviation: The Key Ideas

Learn how to calculate sample standard deviation and understand its significance in statistical analysis. Explore examples and best practices for real-world data interpretation.

Allan Ouko

Tutorial

Geometric Mean: A Measure for Growth and Compounding

Discover the power of the geometric mean in finance, biology, and data science. Learn how to calculate it, when to use it, and why it's useful for analyzing growth rates.

Vinod Chugani

See More See More

Mean and Median Definitions

What is the mean?

What is the median?

What about the mode?

How to Calculate the Mean and Median

How to find the mean

How to find the median

Why the Difference Matters: Outliers and Skew

Impact of outliers

Skewed distributions

Why this matters

Choosing Mean or Median in Different Scenarios

Visualizing Mean vs. Median

bar chart example

histogram example

box plot example

Some More Advanced Ideas

Mean vs. median imputation

Mean vs. median: parametric or non-parametric?

Mean vs. median stability testing

Mean vs. median as optimization problems

Mean vs. median computational complexity

Next Steps

Mean vs. Median FAQs

Can the mean and median be the same?

Are there situations where neither mean nor median is sufficient?

Why is the mean more affected by outliers than the median?

How do you think about choosing between the mean and median?

Correlation vs. Causation: Understanding the Difference in Data Analysis

Arithmetic Mean: A Foundational Tool for Data Analysis

Normalization vs. Standardization: How to Know the Difference

Harmonic Mean Explained: A Guide to Rates and Ratios

Sample Standard Deviation: The Key Ideas

Geometric Mean: A Measure for Growth and Compounding

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Exploratory Data Analysis in R

Trend Analysis in Power BI

Data Analysis in Excel

Correlation vs. Causation: Understanding the Difference in Data Analysis

Arithmetic Mean: A Foundational Tool for Data Analysis

Normalization vs. Standardization: How to Know the Difference

Harmonic Mean Explained: A Guide to Rates and Ratios

Sample Standard Deviation: The Key Ideas

Geometric Mean: A Measure for Growth and Compounding

Exploratory Data Analysis in R