Skip to main content

Geometric Mean: A Measure for Growth and Compounding

Discover the power of the geometric mean in finance, biology, and data science. Learn how to calculate it, when to use it, and why it's useful for analyzing growth rates.
Nov 1, 2024  · 8 min read

Have you ever calculated the average of your yearly investment returns, only to find that your actual overall return doesn't quite match up? This common scenario in finance highlights the importance of understanding and applying the geometric mean – a useful measure that often proves more appropriate than the widely-used arithmetic mean when dealing with rates of change and compounding effects.

In this tutorial, we'll explore the concept of the geometric mean and its role in data analysis, particularly in fields like finance, biology, and data science. We will also examine scenarios where it outperforms other measures of central tendency.

What is the Geometric Mean?

The geometric mean is a type of average that is particularly useful when working with sets of positive numbers, especially those involving multiplication or exponential growth. Unlike the more common arithmetic mean, which adds numbers and divides by the count, the geometric mean is particularly adept at handling datasets involving multiplicative relationships or exponential growth.

Here are some of the fields where the geometric mean is valuable:

  1. Finance: It's commonly used to calculate average rates of return over multiple periods. Unlike the arithmetic mean, the geometric mean accounts for compounding, making it more accurate for financial calculations.
  2. Biology: In population growth studies, the geometric mean is used to calculate average growth rates over time.
  3. Geometry: One of its most elegant appearances is in right triangles, where the altitude to the hypotenuse embodies the geometric mean of the segments it creates on that hypotenuse. Also, the geometric mean represents the side length of a square that would have the same area as a given rectangle, providing a way to "square" rectangular areas.

As you can see, the geometric mean is a valuable tool in both everyday calculations and also in more abstract mathematical concepts.

How to Calculate the Geometric Mean

There are several methods to compute the geometric mean, each with its own advantages depending on the situation.

The geometric mean formula

The most straightforward way to calculate the geometric mean is by using the standard formula: 

Here, x₁, x₂, ..., xₙ are the positive numbers in the dataset, and n is the count of numbers. To obtain the geometric mean we multiply all the values together and take the nth root of the product, where n is the number of values. Let's showcase this through an example. Consider the numbers 2, 4, and 8. To find their geometric mean:

  1. Multiply the numbers: 2 * 4 * 8 = 64
  2. Take the cube root (since there are 3 numbers): ∛64 ≈ 4

The geometric mean of 2, 4, and 8 is 4. This result tells us that if we had three equal numbers, each equal to 4, their product would be the same as the product of the original numbers. Essentially, 4 is the consistent growth factor across all three values.

Calculating the geometric mean by adding logarithms

For larger datasets or to avoid overflow errors when dealing with very large numbers, we can use logarithms. This method simplifies multiplication into addition and makes calculations more manageable. Here, we first take the logarithm of each number, calculate the arithmetic mean of these logarithms, and then take the antilog (exponential) of this mean. Let’s use the same example as above. 

  1. Take logarithms (base 10): log(2) ≈ 0.3010, log(4) ≈ 0.6021, log(8) ≈ 0.9031
  2. Calculate the arithmetic mean: (0.3010 + 0.6021 + 0.9031) / 3 ≈ 0.6021
  3. Take the antilog: (10^x) = 10^0.6021 ≈ 4

This method gives us the same result as the direct calculation method. You can use logarithms of any base (such as natural logarithms with base e), as long as you use the same base throughout the calculation. The final geometric mean will be the same.

Geometric Mean in Data Science and Machine Learning

The geometric mean is a useful statistical measure with several important applications in data science and machine learning. Here are three key use cases where the geometric mean proves valuable. 

Growth rates and time series analysis

In data science, we often analyze growth rates over time, such as in population dynamics, market trends, or user adoption rates. The geometric mean is ideal for these situations because it accurately captures the compounding nature of growth. Unlike the arithmetic mean, which simply averages the rates without considering how each year's growth builds upon the last, the geometric mean accounts for this compounding effect.

Imagine you're analyzing the annual growth rate of a startup's user base over five years: 20%, 15%, 25%, 10%, and 30%. While the arithmetic mean (20%) might seem like a quick solution, it doesn't account for the compounding effect. The geometric mean ((1.20 * 1.15 * 1.25 * 1.10 * 1.30)^(1/5) - 1 ≈ 19.77%) provides a more accurate average growth rate that, if applied consistently each year, would result in the same final value as the actual varied growth rates.

Statistical analysis of skewed data

In data science, we often deal with datasets that are positively skewed due to multiplicative processes, such as income levels, biological measurements, or certain financial indicators. In such cases, using the arithmetic mean can be misleading because it is sensitive to extreme values (outliers), which can distort the representation of the central tendency.

The geometric mean is particularly useful when dealing with data that follow a log-normal distribution. Here's why:

  • Minimizes the Influence of Outliers: The geometric mean reduces the impact of extremely high or low values, providing a more representative central value for skewed data.
  • Handles Multiplicative Relationships: It is appropriate for data where values are combined multiplicatively rather than additively.

Consider a dataset of household incomes in a region where most households earn between $30,000 and $60,000, but a few earn over $1 million.

  • Arithmetic Mean Income: This will be heavily influenced by the high-income outliers, potentially suggesting an average income much higher than what most people earn.
  • Geometric Mean Income: This provides a central value that reflects the typical income more accurately by minimizing the influence of the extreme values.

By using the geometric mean in such cases, data scientists can obtain a more accurate measure of central tendency, leading to better insights and more reliable analyses. This makes the geometric mean an essential tool when working with skewed data distributions in various fields.

Evaluation metrics for imbalanced datasets

In classification tasks, we often encounter imbalanced datasets where some groups are represented by a much larger number of examples than others. In such cases, traditional evaluation metrics like accuracy can be misleading. This is where the Geometric Mean Score, or G-Mean, becomes a valuable tool.

The G-Mean is the root of the product of class-wise sensitivity (recall). For binary classification, it's the square root of the product of sensitivity and specificity. For multi-class problems, it's a higher root of the product of sensitivity for each class.

G-Mean aims to balance accuracy across all classes, making it particularly useful for imbalanced datasets. However, it's important to note that G-Mean is undefined when any class has zero sensitivity, limiting its applicability in extreme imbalance scenarios. In practice, modified versions or alternative metrics may be used to address this limitation.

To see more how the geometric mean and other statistical measures are applied in machine learning, consider exploring our Machine Learning Scientist with Python career track, which offers in-depth lesson that cover essential machine learning concepts and techniques, with practical insights and hands-on experience.

Become an ML Scientist

Upskill in Python to become a machine learning scientist.
Start Learning for Free

Geometric Mean vs. Other Means

When analyzing data, the type of mean you choose will impact the representation and interpretation of your results. The arithmetic mean, geometric mean, and harmonic mean are three different types of averages, each suited to specific types of data and contexts. Understanding the differences between them helps in selecting the right measure for your dataset.

Geometric mean vs. arithmetic mean

The arithmetic mean is the most commonly used average and is calculated by summing all the numbers in a dataset and dividing by the count of numbers. It is best suited for additive processes where values are combined through addition. It is appropriate for datasets that do not contain extreme outliers or skewed distributions. This mean is commonly used in calculating average scores, temperatures, and other quantities where the values sum up to form a total.

In contrast, the geometric mean is ideal for multiplicative processes where values are interrelated multiplicatively. It is appropriate for analyzing growth rates, percentages, and ratios. The geometric mean is often used in financial calculations like average return rates, biological growth rates, and scenarios involving compounding, as it accurately accounts for the effects of exponential growth or decay.

Geometric mean vs. harmonic mean

The harmonic mean is calculated as the reciprocal of the arithmetic mean of the reciprocals of the data values. It is especially useful when dealing with data that are rates or ratios, and when smaller values need more emphasis. The harmonic mean is best applied in situations where the data points are defined in relation to some unit (like time or distance), and you want to find an average rate.

For instance, the harmonic mean is ideal for calculating average speeds when covering the same distance at different speeds. Since time varies inversely with speed, the harmonic mean accurately accounts for the time spent traveling at each speed, giving a true average. It emphasizes the influence of lower values, ensuring that slower speeds (which take more time) have a greater impact on the overall average speed.

In contrast, while the geometric mean handles multiplicative relationships and is appropriate for growth rates and proportional changes, the harmonic mean focuses on rates and ratios where the reciprocal relationship is key. 

Things to Consider with the Geometric Mean

When deciding whether to use the geometric mean for analyzing data, it's important to understand both its advantages and potential limitations. The geometric mean is a powerful tool for certain types of data, but it may not be appropriate in all situations.

Advantages of the geometric mean

  • Best for Multiplicative Data Sets: The geometric mean is ideal for data that involve multiplicative processes, such as growth rates, percentages, ratios, and indices. It accurately captures the compounded effect of changes over time or across different factors.
  • Mitigates the Impact of Outliers: Compared to the arithmetic mean, the geometric mean reduces the influence of extremely large values (outliers) in a dataset. This makes it a better measure of central tendency for skewed data distributions, especially when the data are positively skewed due to a few large values.

Limitations of the geometric mean

  • Cannot Handle Negative Numbers or Zero Values: The geometric mean is only defined for positive real numbers. It cannot be calculated if any value in the dataset is zero or negative because it involves taking roots of the product of the values, and the logarithm of zero or a negative number is undefined in the real number system.
  • May Not Be Intuitive for All Datasets: In some contexts, the geometric mean may not provide an easily interpretable measure of central tendency, especially for datasets that do not involve multiplicative relationships. For additive processes or data where values are combined through addition, the arithmetic mean may be more intuitive and appropriate.

Sensitivity to extreme values

To understand how the geometric mean handles outliers differently than the arithmetic mean, let's compare two investment return datasets over five years: Set A with returns of 5%, 7%, 9%, 6%, and 8%, and Set B with an extreme 50% return in the final year.

  • Arithmetic Mean
    • Set A: 7% average return.
    • Set B: The high 50% return inflates the average to 15.4%, demonstrating how arithmetic means are easily skewed by outliers.
  • Geometric Mean
    • Set A: 6.83% average growth rate.
    • Set B: The extreme return lifts the average more moderately to 14.84%, showing the geometric mean’s ability to balance extreme values by emphasizing consistent growth.

It might seem small, but in financial analysis, this difference is large. Unlike the arithmetic mean, which can exaggerate averages when outliers are present, the geometric mean provides a balanced growth rate that respects compounding effects. 

Conclusion: Why the Geometric Mean Matters

I hope after reading this you have an appreciation for the geometric mean, which is especially valuable in fields dealing with multiplicative relationships and compounding effects. As you have seen, it can accurately represent central tendency in scenarios involving rates, ratios, and exponential growth and this feature distinguishes it from the arithmetic mean. This makes it impmortant in finance when calculating investment returns, in biology when considering population growth analysis, and in data science more generally, whenever we need to handle skewed datasets and evaluating machine learning models on imbalanced data.

Enroll in our Machine Learning Scientist with Python career track to keep learning and land yourself a role in the exciting fields of data science and machine learning. 


Photo of Vinod Chugani
Author
Vinod Chugani
LinkedIn

As an adept professional in Data Science, Machine Learning, and Generative AI, Vinod dedicates himself to sharing knowledge and empowering aspiring data scientists to succeed in this dynamic field.

Geometric Mean FAQs

What is the simplest definition of geometric mean?

The geometric mean is the nth root of the product of n numbers.

How does the geometric mean differ from the arithmetic mean?

While the arithmetic mean is calculated by adding numbers and dividing by the count, the geometric mean is calculated by multiplying numbers and taking the nth root.

When should I use the geometric mean instead of the arithmetic mean?

Use the geometric mean when dealing with ratios, percentages, or growth rates, especially over multiple periods. It's particularly useful for data sets with exponential growth or decay.

Can the geometric mean be used with negative numbers?

No, the geometric mean is only defined for positive real numbers. It cannot be calculated if any value in the dataset is zero or negative because it involves taking roots of the product of the values.

What is the relationship between geometric mean and logarithms?

The geometric mean can be calculated using logarithms, which is especially useful for large datasets or when dealing with very large numbers. By taking the logarithm of each number, calculating their arithmetic mean, and then taking the antilog of the result, you can obtain the geometric mean.

How does the geometric mean handle outliers compared to the arithmetic mean?

The geometric mean is less sensitive to extreme values or outliers than the arithmetic mean. It tends to dampen the effect of very large values, making it a more robust measure of central tendency for skewed data distributions, especially those with positive skewness.

Topics

Learn with DataCamp

course

Exploratory Data Analysis in Python

4 hr
53.1K
Learn how to explore, visualize, and extract insights from data using exploratory data analysis (EDA) in Python.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

blog

Data Science ROI: How to Calculate and Maximize It

This comprehensive guide teaches you how to calculate and maximize Data Science ROI. Discover strategies to measure success and boost business value.
Vinita Silaparasetty's photo

Vinita Silaparasetty

25 min

tutorial

Arithmetic Mean: A Foundational Tool for Data Analysis

Explore the arithmetic mean's role in data analysis. Learn its formula, applications, and how it compares to other kinds of means and other statistical measures, and understand when each is most useful.
Vinod Chugani's photo

Vinod Chugani

7 min

tutorial

Characteristic Equation: Everything You Need to Know for Data Science

Understand how to derive the characteristic equation of a matrix and explore its core properties. Discover how eigenvalues and eigenvectors reveal patterns in data science applications. Build a solid foundation in linear algebra for machine learning.
Vahab Khademi's photo

Vahab Khademi

9 min

tutorial

How to Add, Subtract, Divide and Multiply in Spreadsheets

Learn how to apply operations like add, subtract, divide, multiply, and a lot more in Google Spreadsheets with the help of an actual dataset.
Aditya Sharma's photo

Aditya Sharma

9 min

tutorial

Rank Formula in Excel: A Comprehensive Guide With Examples

Learn how to rank data in Excel with RANK(), RANK.EQ(), and RANK.AVG() functions. Understand their differences, applications, and tips for accurate data analysis.
Laiba Siddiqui's photo

Laiba Siddiqui

30 min

tutorial

Moving Averages in pandas

Learn how you can capture trends and make sense out of time series data with the help of a moving or rolling average.
Aditya Sharma's photo

Aditya Sharma

8 min

See MoreSee More