Understanding skewness can help identify potential outliers or anomalies in the data, which could indicate data quality issues or areas that require further investigation. Skewness also provides insights into the asymmetry of the data, which impacts the choice of statistical methods and downstream models. Thus, understanding the skewness of the underlying data is crucial in analytics projects.
In this tutorial, we will learn the concept of skewness, its mathematical definitions, and the different types of skewness in data distributions. Using a simple real-world example, we will calculate skewness using Microsoft Excel and interpret the results. If, at the end of this tutorial, you are looking to expand your skills further, consider exploring our Data Analysis in Excel course.
The Quick Answer: How to Calculate Skewness in Excel
To calculate skewness in Excel, use either the SKEW()
function or the SKEW.P()
function.
Follow these steps:
- Enter your data in a single column.
- In a new cell, simply type
=SKEW(range)
or=SKEW.P(range)
, whererange
is the range of your data cells. - Press Enter to get the skewness value. Skewness is represented by a single number.
Calculating skewness using the SKEW() function. Image by Author
The SKEW()
function assumes that the data provided represents a sample from a population. It divides by n−1
where n
is the number of data points. This is known as Bessel's correction.
The SKEW.P()
function calculates the skewness of an entire population. It divides by n
, not n-1
.
What is Skewness?
Skewness is a statistical metric that measures the asymmetry of a distribution. It indicates whether the data points tend to fall more to one side of the mean than the other.
The four moments of a distribution are:
- Center: The first moment indicates the central location or average of the data. It provides a measure of the typical value around which the data points are distributed.
- Spread: The second moment measures the dispersion or variability of the data. It quantifies how much the data points deviate from the mean, reflecting the extent of spread in the dataset.
- Skewness: The third moment quantifies the asymmetry of the data distribution around its mean. It indicates whether the data distribution is skewed to the left or right.
- Kurtosis: The fourth moment describes the "tailedness" or peakedness of the data distribution. It provides insight into the weight of the tails and the sharpness of the peak.
Skewed Distributions
Depending on whether the skewness coefficient is positive, negative, or zero, there are three types of skewness.
Skewed distributions. Source: Wikipedia
- Positive Skewness (Right Skewness): The tail on the right side of the distribution is longer than the left. Here, most data points are concentrated on the left, with some extremely high values on the right. The mean will be typically greater than the median in this type of distribution.
- Zero Skewness (Symmetrical Distribution): These distributions are perfectly symmetrical around the mean. The tails on both sides of the mean are balanced. The mean and median are seen to be equal.
- Negative Skewness (Left Skewness): The tail on the left side of the distribution is longer than the right. In these distributions, data points are concentrated on the right, with some extremely low values on the left. We can observe that the mean is typically less than the median.
While skewness focuses on the spread (tails) of the distribution, another statistical metric, kurtosis, focuses more on the height. It tells us how peaked or flat our normal (or normal-like) distribution is. Together, they highlight both the asymmetry and the extremity of deviations in the dataset. Our separate tutorial, Understanding Skewness And Kurtosis And How to Plot Them, explains why these metrics are used together to provide a complete picture of data distribution.
Now that we’ve understood the concept of skewness and its various types, let’s calculate it for sample data using Excel.
Advance Your Career with Excel
Gain the skills to maximize Excel—no experience required.
Methods to Calculate Skewness in Excel
Imagine you’re a data analyst tasked to analyze customer wait times at a popular retail chain. The company is experiencing varied customer satisfaction levels across its stores, and management suspects long wait times at checkout might be a contributing factor.
To address this, they have sampled data on customer wait times from a store over a few days.
Customer wait time dataset. Image by Author
As part of analyzing customer wait times, you’ve realized that calculating the skewness metric will provide useful insights. Here are two methods you could utilize to calculate the skewness metric using Microsoft Excel.
Using the SKEW() function
The function computes the degree of skewness of a sample. Here is the syntax of the SKEW()
function:
SKEW(number1, [number2], ...)
For the SKEW()
function, we can include up to 255 parameters, each of which can be a single data point or an array of data points.
For our data in hand, we would enter the following formula in a cell, say F2:
=SKEW(B2:B13)
Calculating the skewness using the SKEW() function. Image by Author
Next, once we press Enter on the cell, we can see the skewness coefficient calculated:
Skewness result. Image by Author
The positive output indicates that the wait times are right-skewed. Before interpreting the value, let’s explore another method to find the skewness metric.
Using the Data Analysis Toolpak
The Data Analysis Toolpak is an additional add-in for Microsoft Excel that helps calculate metrics commonly used in data analytics tasks.
Since this add-in isn’t enabled by default, you must check the top right of your Excel workbook for the Data Analysis icon under the Data tab, as shown below.
Data Analysis ToolPak in Excel. Image by Author
If you don’t see the icon, it hasn't been enabled. To enable it, click on File from the menu and select Options.
Selecting Options from the File Tab. Image by Author
Select Add-ins when the Excel Options dialog box opens.
Select Add-ins from the Excel Options dialog box. Image by Author
Next, select Excel Add-ins in the Manage box at the bottom, and click Go.
Managing Excel add-ins. Image by Author
Check Analysis ToolPak once the Add-Ins dialog box opens and click OK.
Enabling Data Analysis ToolPak. Image by Author
The Data Analysis icon will be visible under the Data tab now, and you need not repeat this process, as enabling the add-in is a one-time task.
As the first step towards calculating the skewness using the enabled add-in, select the range of data points, including the column header, and click on the Data Analysis icon, as shown below.
Invoking the Data Analysis Toolpak add-in. Image by Author
Next, select Descriptive Statistics from the dialog box and click OK.
Selecting descriptive statistics. Image by Author
You will be prompted with a dialog box, as shown below.
Filling in details for descriptive statistics. Image by Author
The Input Range has been prefilled since we selected it before invoking the add-in. Check the Labels in the first row box, and for the Output Range, we select a cell in the same sheet, say $E$5
. Check the Summary statistics box, which includes the skewness metric.
Click OK to see the calculated results in the same Excel sheet.
Descriptive statistics results. Image by Author
That’s it! You’ve calculated the skewness and other useful descriptive statistics in one go.
If you are interested in learning more about the Data Analysis Toolpak, check out our Comprehensive Guide to Using ANOVA in Excel tutorial for another useful example.
Interpreting the Skewness Results
Now that we have the skewness and other descriptive statistics, we can interpret and uncover some useful insights about the sample of data.
- Right-Skewed Distribution: Using both methods, we get the same value of 1.0128 as the skewness metric calculated for the customer wait times at the retail store, indicating a right-skewed (positively skewed) distribution. This means there are outliers in the data with unusually high wait times.
- Mean versus Median: The mean wait times are higher than the median. This suggests that while most customers experience relatively short wait times, there are many instances where wait times are considerably longer, pulling the average up. So median is probably a better measure of center.
- Business Insights: The mode of the wait times is 5 minutes, the most frequently occurring wait time, and indicates that most customers are served relatively quickly. However, the presence of right skewness and a range of 8 minutes highlights that there are significant variations in wait times. The management should focus on reducing these variations to improve overall customer satisfaction.
Suppose the obtained skewness coefficient is negative — how would your interpretation change? Here’s how we can interpret negative skewness values for customer wait times.
- Left-Skewed Distribution: A negative skewness value would indicate that customer wait times follow a left-skewed distribution, with the tail on the left side of the distribution (the lower values) being longer than the right side.
- Low-Valued Outliers: This would suggest that the bulk of the data is concentrated towards the higher end of the scale, with a few lower outliers of extremely low wait times.
- Business Insights: A negatively skewed distribution of customer wait times could indicate efficient operations for the majority of the time. However, it’s important to explore what factors contribute to the exceptionally low values, as these might reveal efficiencies or operational practices that could be leveraged more broadly to improve overall service.
Keep in mind that Calculating other descriptive statistics that complement the skewness metrics helps us reinforce our findings. This is a common practice in data analytics.
A Deep Dive Into What the Excel Skewness Functions Are Doing
In most practical applications, particularly those involving general statistical analysis or software tools like Excel, the momental skewness is commonly used due to its robustness and comprehensive nature. It is often the default choice unless specific characteristics of the dataset or analysis goals suggest otherwise.
Momental skewness, also known as the third standardized moment, is a statistical measure used to quantify the asymmetry of a data distribution around its mean.
The theoretical definition calculates this by dividing the third central moment of the distribution, which captures the degree of deviation from the mean cubed, by the cube of the standard deviation. However, when analyzing a sample from a larger population, particularly in practical data analysis scenarios, the derived formula for sample skewness is: