Course
Covariance plays a key role in statistics by revealing how two variables change in relation to each other. It’s central to ideas like correlation and principal component analysis and regression.
In this guide, you’ll learn what covariance means, how to calculate it, and where it’s used, from financial modeling to machine learning.
And because covariance is just one important thing to know, make sure to enroll in our Statistics Fundamentals in Python skill track and/or our Introduction to Statistics in R course to keep learning.
What Is Covariance?
Covariance is a fundamental statistical function that measures how two variables, x and y, change together. If the variables tend to increase or decrease simultaneously, covariance is positive. If one increases while the other decreases, covariance is negative.
The mathematical definition of covariance for two random variables X and Y is:
where 𝜇x and 𝜇y are the means of X and Y, respectively.
For a sample of size n, the sample covariance is calculated as:
where x̅ and ȳ are the sample means of x and y.
The decision to use μ (mu) for population means and x̅ for sample means is a convention that helps distinguish the two.
Why Covariance Matters
Understanding covariance helps you analyze the relationship between two variables. In finance, covariance is used to assess how two stocks move together. In data science, covariance is needed for techniques like PCA, which reduces the dimensionality of datasets. It also sits under regression analysis, where understanding how variables co-vary is important for modeling their linear relationships.
In a nutshell, covariance provides insight into 1) the direction of the linear relationship between variables, 2) the strength of the relationship (though not standardized), and 3) the foundation for calculating the correlation coefficient.
Calculating Covariance by Hand
Let’s practice. To calculate sample covariance by hand, follow these steps:
- Find the mean of each variable.
- Subtract the mean from each value to get the deviations.
- Multiply the deviations for corresponding pairs.
- Sum the products.
- Divide by n − 1 for the sample covariance.
For example, given two variables:
- x: 2, 4, 6
- y: 5, 9, 13
Find the mean of each variable
First, calculate the means:
Subtract the mean from each value to get the deviations
Next, compute the deviations from the mean. I created a table to show how this works. Notice how on the right side of the table, each data point is subtracted by either 4 or 9.
Multiply the deviations for corresponding pairs
Now, multiply the deviation for each pair:
Sum the products
Then, we sum the products: 8 + 0 + 8 = 16
Divide by n − 1 for the sample covariance
Finally, we divide by n − 1 to get sample covariance.
We can write our answer like so:
Covariance in Python and R
You might be trying to figure out covariance in a programming environment. I’ll show you how to do this in Python and R, starting with Python.
Covariance in Python
You can calculate covariance in Python using NumPy.
To use NumPy’s cov()
function, first import NumPy and define your data:
import numpy as np
x = np.array([2, 4, 6])
y = np.array([5, 9, 13])
cov_matrix = np.cov(x, y, ddof=1)
print(cov_matrix)
The output is a covariance matrix:
[[4. 8.]
[8. 16.]]
We see that the covariance between the two variables is 8, which is the same result as we got by hand.
Covariance in R
You can calculate covariance in R using the built-in cov()
function.
To get started, define your data vectors and pass them to cov()
:
x <- c(2, 4, 6)
y <- c(5, 9, 13)
cov_matrix <- cov(cbind(x, y))
print(cov_matrix)
The output is a covariance matrix:
x y
x 4 8
y 8 16
The covariance between the two variables is 8, just like in the Python example.
Interpreting the Covariance Matrix
The covariance matrix summarizes the pairwise covariance between multiple variables. The output we just saw from Python and R code was a covariance matrix, albeit a small one (2x2).
Let's try a larger example. For three variables x, y, and z, the covariance matrix is:
This matrix is symmetric, and the diagonal elements are the variances of each variable. (This is true because the covariance of a variable with itself is the variance.)
Covariance vs. Correlation
While covariance measures the direction of the relationship between two variables, it does not standardize the result. Correlation standardizes covariance to a value between −1 and 1, making it easier to interpret the strength of the relationship.
There are many formulas for the correlation coefficient, but one of the formulas is:
Where:
- Cov(x,y) is the covariance between variables x and y
- σx (pronounced as 'sigma') is the standard deviation of x
- σy is the standard deviation of y
Additional Things to Know
When working with covariance, be aware of these common issues:
- Covariance is sensitive to the scale of the variables. Large values can inflate the result.
- Covariance does not indicate the strength of the relationship in a standardized way.
- Outliers can significantly affect the covariance calculation.
Always consider standardizing your data or using correlation for clearer interpretation.
Conclusion
Covariance is a must-know statistical tool for understanding how variables move together. You need to know covariance to really understand your data relationships. Don’t worry if you feel unclear on some aspects of it, we have the right resources to help you, so enroll today:
- Statistics Fundamentals in Python skill track
- Introduction to Statistics in R course

I'm a data science writer and editor with contributions to research articles in scientific journals. I'm especially interested in linear algebra, statistics, R, and the like. I also play a fair amount of chess!
Covariance FAQs
What does a covariance of 0 mean?
It means the two variables have no linear relationship, but they may still be dependent in a nonlinear way.
How is covariance used in finance?
It helps assess how two assets move together and is used in portfolio optimization and risk management.
What’s the difference between covariance and correlation?
Covariance shows direction, correlation shows both direction and strength, normalized between -1 and 1.
Can covariance be negative?
Yes, a negative covariance indicates an inverse relationship.
Is covariance affected by units?
Yes, unlike correlation, covariance retains units (e.g., cm-years).