Skip to main content

Understanding Covariance: An Introductory Guide

Discover how covariance reveals relationships between variables. Learn how to calculate and interpret it across statistics, finance, and machine learning.
Jun 24, 2025  · 5 min read

Covariance plays a key role in statistics by revealing how two variables change in relation to each other. It’s central to ideas like correlation and principal component analysis and regression.

In this guide, you’ll learn what covariance means, how to calculate it, and where it’s used, from financial modeling to machine learning.

And because covariance is just one important thing to know, make sure to enroll in our Statistics Fundamentals in Python skill track and/or our Introduction to Statistics in R course to keep learning.

What Is Covariance?

Covariance is a fundamental statistical function that measures how two variables, x and y, change together. If the variables tend to increase or decrease simultaneously, covariance is positive. If one increases while the other decreases, covariance is negative.

The mathematical definition of covariance for two random variables X and Y is:

population covariance formula

where 𝜇x and 𝜇y are the means of X and Y, respectively.

For a sample of size n, the sample covariance is calculated as:

sample covariance formula

where and ȳ are the sample means of x and y.

The decision to use μ (mu) for population means and for sample means is a convention that helps distinguish the two.

Why Covariance Matters

Understanding covariance helps you analyze the relationship between two variables. In finance, covariance is used to assess how two stocks move together. In data science, covariance is needed for techniques like PCA, which reduces the dimensionality of datasets. It also sits under regression analysis, where understanding how variables co-vary is important for modeling their linear relationships.

In a nutshell, covariance provides insight into 1) the direction of the linear relationship between variables, 2) the strength of the relationship (though not standardized), and 3) the foundation for calculating the correlation coefficient.

Calculating Covariance by Hand

Let’s practice. To calculate sample covariance by hand, follow these steps:

  1. Find the mean of each variable.
  2. Subtract the mean from each value to get the deviations.
  3. Multiply the deviations for corresponding pairs.
  4. Sum the products.
  5. Divide by n − 1 for the sample covariance.

For example, given two variables:

  • x: 2, 4, 6
  • y: 5, 9, 13

Find the mean of each variable

First, calculate the means:

calculating the mean of x to help find covariance

calculating the mean of y to help find covariance

Subtract the mean from each value to get the deviations

Next, compute the deviations from the mean. I created a table to show how this works. Notice how on the right side of the table, each data point is subtracted by either 4 or 9.

a table of standard deviations from the mean

Multiply the deviations for corresponding pairs

Now, multiply the deviation for each pair:

multiplying the standard deviations for each pair

Sum the products

Then, we sum the products: 8 + 0 + 8 = 16

summing the products to find covariance

Divide by n − 1 for the sample covariance

Finally, we divide by n − 1 to get sample covariance.

working out the sample covariance from our data

We can write our answer like so:

the answer to covariance formula

Covariance in Python and R

You might be trying to figure out covariance in a programming environment. I’ll show you how to do this in Python and R, starting with Python. 

Covariance in Python

You can calculate covariance in Python using NumPy.

To use NumPy’s cov() function, first import NumPy and define your data:

import numpy as np

x = np.array([2, 4, 6])
y = np.array([5, 9, 13])

cov_matrix = np.cov(x, y, ddof=1)
print(cov_matrix)

The output is a covariance matrix:

[[4. 8.]
 [8. 16.]]

We see that the covariance between the two variables is 8, which is the same result as we got by hand.

Covariance in R

You can calculate covariance in R using the built-in cov() function.

To get started, define your data vectors and pass them to cov():

x <- c(2, 4, 6)
y <- c(5, 9, 13)

cov_matrix <- cov(cbind(x, y))
print(cov_matrix)

The output is a covariance matrix:

  x  y
x 4  8
y 8 16

The covariance between the two variables is 8, just like in the Python example.

Interpreting the Covariance Matrix

The covariance matrix summarizes the pairwise covariance between multiple variables. The output we just saw from Python and R code was a covariance matrix, albeit a small one (2x2).

Let's try a larger example. For three variables x, y, and z, the covariance matrix is:

This matrix is symmetric, and the diagonal elements are the variances of each variable. (This is true because the covariance of a variable with itself is the variance.)

Covariance vs. Correlation

While covariance measures the direction of the relationship between two variables, it does not standardize the result. Correlation standardizes covariance to a value between −1 and 1, making it easier to interpret the strength of the relationship.

There are many formulas for the correlation coefficient, but one of the formulas is:

covariance formula related to correlation

Where:

  • Cov⁡(x,y) is the covariance between variables x and y
  • σx (pronounced as 'sigma') is the standard deviation of x
  • σy​ is the standard deviation of y

Additional Things to Know

When working with covariance, be aware of these common issues:

  • Covariance is sensitive to the scale of the variables. Large values can inflate the result.
  • Covariance does not indicate the strength of the relationship in a standardized way.
  • Outliers can significantly affect the covariance calculation.

Always consider standardizing your data or using correlation for clearer interpretation.

Conclusion

Covariance is a must-know statistical tool for understanding how variables move together. You need to know covariance to really understand your data relationships. Don’t worry if you feel unclear on some aspects of it, we have the right resources to help you, so enroll today: 


Josef Waples's photo
Author
Josef Waples

I'm a data science writer and editor with contributions to research articles in scientific journals. I'm especially interested in linear algebra, statistics, R, and the like. I also play a fair amount of chess! 

Covariance FAQs

What does a covariance of 0 mean?

It means the two variables have no linear relationship, but they may still be dependent in a nonlinear way.

How is covariance used in finance?

It helps assess how two assets move together and is used in portfolio optimization and risk management.

What’s the difference between covariance and correlation?

Covariance shows direction, correlation shows both direction and strength, normalized between -1 and 1.

Can covariance be negative?

Yes, a negative covariance indicates an inverse relationship.

Is covariance affected by units?

Yes, unlike correlation, covariance retains units (e.g., cm-years).

Topics

Learn with DataCamp

Course

Foundations of Probability in Python

5 hr
14.3K
Learn fundamental probability concepts like random variables, mean and variance, probability distributions, and conditional probabilities.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

blog

R Correlation Tutorial

Get introduced to the basics of correlation in R: learn more about correlation coefficients, correlation matrices, plotting correlations, etc.
David Woods's photo

David Woods

13 min

Correlation vs. Causation

blog

Correlation vs. Causation: Understanding the Difference in Data Analysis

Learn the critical difference between correlation and causation in data analysis. Understand real-world examples and avoid common pitfalls in interpreting data.
Richie Cotton's photo

Richie Cotton

8 min

Tutorial

Correlation Matrix In Excel: A Complete Guide to Creating and Interpreting

Learn the statistical concept of correlation, and follow along in calculating and interpreting correlations for a sample dataset, in a step-by-step tutorial.
Arunn Thevapalan's photo

Arunn Thevapalan

9 min

Tutorial

Characteristic Equation: Everything You Need to Know for Data Science

Understand how to derive the characteristic equation of a matrix and explore its core properties. Discover how eigenvalues and eigenvectors reveal patterns in data science applications. Build a solid foundation in linear algebra for machine learning.
Vahab Khademi's photo

Vahab Khademi

9 min

Tutorial

Understanding Skewness And Kurtosis And How to Plot Them

A comprehensive visual guide into skewness/kurtosis and how they effect distributions and ultimately, your data science project.
Bex Tuychiev's photo

Bex Tuychiev

10 min

Tutorial

Python Details on Correlation Tutorial

A tutorial to understand what correlation is and why it is important for every aspiring data scientist to know it.
Javier Canales Luna's photo

Javier Canales Luna

13 min

See MoreSee More