Skip to main content
HomeTutorialsData Analysis

Understanding Skewness And Kurtosis And How to Plot Them

A comprehensive visual guide into skewness/kurtosis and how they effect distributions and ultimately, your data science project.
Updated Dec 2023  · 10 min read

After collecting data and spending hours cleaning it, you can finally start exploring! This stage, often called Exploratory Data Analysis (EDA), is perhaps the most important step in a data project. The insights gained from EDA affect everything down the line.

For example, one of the must-do steps in EDA is checking the shapes of distributions. Correctly identifying the shape influences many decisions later on in the project such as:

  • Further preprocessing steps
  • Whether or not perform outlier detection and maybe, removal
  • Feature transformation or scaling steps
  • Feature selection
  • Algorithm selection

and so on. While there are visuals to do the task, you need more reliable metrics to quantify various characteristics of distributions. Two of such metrics are skewness and kurtosis. You can use them to assess the resemblance between your distributions and a perfect, normal distribution.

By finishing this article, you will learn in detail:

  • What skewness and kurtosis are
  • The types of skewness and kurtosis
  • The effect of skewness and kurtosis on machine learning models
  • Calculating skewness and kurtosis in Python manually and with third-party packages
  • Visualizing distributions to verify skewness and kurtosis scores (comprehensive and fun section)

Let’s get started!

What is Skewness?

We see normal distribution everywhere: human body measurements, weights of objects, IQ scores, test results, or even at the gym:

An image of weights machine in the gym. The pin holes in the weight collectively forming a normal distribution shape because of overuse for a long time.

Source

Besides being nature’s favorite distribution, it is universally loved by almost all machine learning algorithms. While some want it to improve and stabilize their performance, some outright refuse to work well with anything other than normal distribution (I am talking to you, linear models).

So, to satisfy the algorithms’ need for normalcy, we need a way to measure how similar or (dissimilar) our own distributions are compared to the perfect bell-shaped curve.

Perfect bell-shaped curve of a normal distribution.

Let’s start with the tails. In a perfect normal distribution, the tails are equal in length. But, when there is asymmetry between the tails, giving it a leaned, squished-to-one-side look, we say it is skewed. And you guessed it, we measure the extent of this asymmetry with skewness.

Different types skewness on the sides of a non-skewed normal distribution.

Correctly categorizing and measuring skewness provides insights into how values are spread around the mean and influence the choices of statistical techniques and data transformations. For example, highly skewed distributions might benefit from normalization or scaling techniques to make them resemble normal distribution. This would aid in model performance.

Types of Skewness

There are three types of skewness: positive, negative, and zero skewness.

Let’s start with the last one. A distribution with zero skewness has the following characteristics:

  • Symmetric distribution with values evenly centered around the mean.
  • No skew, lean or tail to either side.
  • The mean, median, and mode are all at the center point.

A normal distribution (zero-skewed) with mean, median and mode all aligned in a single line.

In practice, mean, median, and mode may not form a perfect overlapping straight line. They may be slightly away from each other but the difference would be too small to matter.

In a distribution with positive skewness (right-skewed):

  • The right tail of the distribution is longer or fatter than the left.
  • The mean is greater than the median, and the mode is less than both mean and median.
  • Lower values are clustered in the “hill” of the distribution, while extreme values are in the long right tail.
  • It is also known as right-skewed distribution.

Right skewed distribution with mean, median and mode annotated.

In a distribution with negative skewness (left-skewed):

  • The left tail of the distribution is longer or fatter than the right.
  • The mean is less than the median, and the mode is greater than both mean and median.
  • Higher values are clustered in the “hill” of the distribution, while extreme values are in the long left tail.
  • It is also known as left-skewed distribution.

Left-skewed distribution with mean, median and mode annotated

To remember the differences between positive and negative skewness, think of it this way: if you want to increase the mean of a distribution, you should add much higher values than the mean to the distribution. To lower the mean, you should do the opposite — introduce much lower values than the mean to the distribution. So, if the majority of the extreme values is higher than the mean, the skewness will be positive because they increase the mean. If the majority of extreme values is smaller than the mean, the skewness is negative because they decrease the mean.

How to Calculate Skewness in Python

There are many ways to calculate skewness, but the simplest one is Pearson’s second skewness coefficient, also known as median skewness.

image.png

Let’s implement the formula manually in Python:

import numpy as np
import pandas as pd
import seaborn as sns

# Example dataset
diamonds = sns.load_dataset("diamonds")
diamond_prices = diamonds["price"]

mean_price = diamond_prices.mean()
median_price = diamond_prices.median()
std = diamond_prices.std()

skewness = (3 * (mean_price - median_price)) / std

>>> print(
   f"The Pierson's second skewness score of diamond prices distribution is {skewness:.5f}"
)

The Pierson's second skewness score of diamond prices distribution is 1.15189

Another formula highly influenced by the works of Karl Pearson is the moment-based formula to approximate skewness. It is more reliable and given as follows:

Another formula to calculate skewness using moments

Here:

  • n represents the number of values in a distribution
  • x_i denotes each data point

Let’s implement it in Python too:

def moment_based_skew(distribution):
   n = len(distribution)
   mean = np.mean(distribution)
   std = np.std(distribution)

   # Divide the formula into two parts
   first_part = n / ((n - 1) * (n - 2))
   second_part = np.sum(((distribution - mean) / std) ** 3)

   skewness = first_part * second_part

   return skewness

>>> moment_based_skew(diamond_prices)
1.618440289857168

If you don’t want to calculate skewness manually (like me), you can use built-in methods from pandas or scipy:

# Pandas version
diamond_prices.skew()
1.618395283383529

# SciPy version
from scipy.stats import skew

skew(diamond_prices)
1.6183502776053016

While all formulas to approximate skewness return different scores, their differences are too small to be significant or change the categorization of the skew. For example, all methods we have used today leverage different formulas under the hood, but the results are very close.

Once you calculate skewness, you can categorize the extent of the skew:

  • (-0.5, 0.5) — low or approximately symmetric.
  • (-1, -0.5) U (0.5, 1) — moderately skewed.
  • Beyond -1 and 1 — Highly skewed.

What is Kurtosis and its Types?

While skewness focuses on the spread (tails) of normal distribution, kurtosis focuses more on the height. It tells us how peaked or flat our normal (or normal-like) distribution is. The term, which means curved or arched from Greek, was first coined by, unsurprisingly, from the British mathematician Karl Pearson (he spent his life studying probability distributions).

High kurtosis indicates:

  • Sharp peakedness in the distribution’s center.
  • More values concentrated around the mean than normal distribution.
  • Heavier tails because of a higher concentration of extreme values or outliers in tails.
  • Greater likelihood of extreme events.

On the other hand, low kurtosis indicates:

  • Flat peak.
  • Fewer values concentrated around the mean but still more than normal distribution.
  • Lighter tails.
  • Lower likelihood of extreme events.

Depending on the degree, distributions have three types of kurtosis:

  1. Mesokurtic distribution (kurtosis = 3, excess kurtosis = 0): perfect normal distribution or very close to it.
  2. Leptokurtic distribution (kurtosis > 3, excess kurtosis > 0): sharp peak, heavy tails
  3. Platykurtic distribution (kurtosis < 3, excess kurtosis < 0): flat peak, light tails

Three different types of distributions: leptokurtic (high kurtosis), mesokurtic (no kurtosis) and platykurtic (low kurtosis)

Note that here, excess kurtosis is defined as kurtosis - 3, treating the kurtosis of normal distribution as 0. This way, kurtosis scores are more interpretable.

How to Calculate Kurtosis in Python

You can calculate kurtosis in Python in the same way as skewness using pandas or SciPy:

from scipy.stats import kurtosis

kurtosis(diamond_prices)
2.177382669056634

Pandas offers two functions for kurtosis: kurt and kurtosis. The first one is exclusive to Pandas Series, while you can use the other on DataFrames.

diamond_prices.kurt()
2.17769575924869

# Select numeric features and calculate kurtosis
diamonds.select_dtypes(include="number").kurtosis()
carat     1.256635
depth     5.739415
table     2.801857
price     2.177696
x        -0.618161
y        91.214557
z        47.086619
dtype: float64

Again, the numbers differ for the distribution because pandas and SciPy use different formulas.

If you want a manual calculation of kurtosis, you can use the following formula:

The formula to calculate kurtosis using moments, influenced by Karl Pearson

Here:

  • n represents the number of observations in the dataset
  • x_i denotes each individual data point

We will implement the formula inside a function again:

def moment_based_kurtosis(distribution):
   n = len(distribution)
   mean = np.mean(distribution)
   std = np.std(distribution)

   kurtosis = (1 / n) * sum(((distribution - mean) / std) ** 4) - 3

   return kurtosis

>>> moment_based_kurtosis(diamond_prices)
2.1773826690576463

And we find out that diamond prices have an excess kurtosis of 2.18, which means if we plot the distribution, it will have a sharper peak than a normal distribution.

So, let’s do that!

Visualizing Skewness and Kurtosis in Python

One of the best visuals to see the shape and, thus, the skewness and kurtosis of distributions is a kernel density estimate (KDE) plot. It is available to use through Seaborn:

import matplotlib.pyplot as plt

sns.kdeplot(diamond_prices)

plt.title("KDE plot of diamond prices")
plt.xlabel("Price ($)")

A KDE plot of diamond prices

This plot is in accordance with the numbers we saw up to this point: the distribution has a long right tail, indicating a positive skewness, and it has a very sharp peak, which corresponds with high kurtosis.

KDE isn’t the only plot to see the shape. We can use histograms as well:

sns.histplot(diamonds["carat"])

plt.xlabel("Carat")
plt.title("A histogram of the carat of diamonds")

A histogram of diamond carats

The disadvantage of histograms is that you have to choose the number of bins (the count of bars) yourself. Here, there are too many bars creating noise in the visual — we can’t clearly define the shape. So, let’s decrease the number of bins:

sns.histplot(diamonds["carat"], bins=25)

plt.xlabel("Carat")
plt.title("A histogram of the carat of diamonds")

A histogram with diamond carats with fewer bins

Now, the shape is more well-defined but we can improve it more. By setting kde=True inside histplot, we can plot a KDE of the distribution on top of the bars:

sns.histplot(diamonds["carat"], bins=25, kde=True)

plt.xlabel("Carat")
plt.title("A histogram of the carat of diamonds")

A histogram overlaid with a kde plot of diamond carats

The overlayed KDE line looks jagged, not the smooth curve that enables us to see the general shape. The reason for the jaggedness is that the carat distribution is naturally spiky and far from the normal distribution.

But, we can decrease the KDE’s sensitivity to these fluctuations by adjusting the bandwidth. This is done using the bw_adjust parameter, which defaults to 1:

# Change the bandwidth from 1 to 3
sns.kdeplot(diamonds["carat"], bw_adjust=3, color="red")

plt.title("KDE of diamond carats");

A KDE of diamond carats with higher bandwidth adjustment

This version looks much less spiky than the overlaid KDE plot. To adjust the KDE bandwidth when using a histogram overlaid with a KDE, you can use the kde_kws parameter:

ax = sns.histplot(
   diamonds["carat"],
   kde=True,
   kde_kws=dict(bw_adjust=3),
   bins=25,
)

plt.title("An overlaid histogram of diamond carats");

A histogram overlaid a KDE plot that has its bandwitdh adjusted.

kde_kws accepts any parameter that is acceptable by kdeplot function that controls the KDE computation.

One trick you can use when plotting KDEs is to remove everything except the KDE line. Since the main point of a KDE is to see the distribution shape, other details of the plot like the axis ticks, the spines, and labels are sometimes unnecessary:

sns.kdeplot(diamond_prices, color="red")

# Remove the spine from three sides
sns.despine(top=True, right=True, left=True)

# Remove the ticks and ticklabels
plt.xticks([])
plt.yticks([])
plt.ylabel("")
plt.xlabel("")

# Set a title
plt.title("Diamond prices", fontdict=dict(fontsize=20));

A KDE plot with everything removed - except the KDE line and the title.

This plot is much tidier. You can further improve the plot by adding lines to denote the position of mean, median, and mode:

sns.kdeplot(diamond_prices, color="red")

sns.despine(top=True, right=True, left=True)
plt.xticks([])
plt.yticks([])
plt.ylabel("")
plt.xlabel("")
plt.title("Diamond prices", fontdict=dict(fontsize=20))

# Find the mean, median, mode
mean_price = diamonds["price"].mean()
median_price = diamonds["price"].median()
mode_price = diamonds["price"].mode().squeeze()

# Add vertical lines at the position of mean, median, mode
plt.axvline(mean_price, label="Mean")
plt.axvline(median_price, color="black", label="Median")
plt.axvline(mode_price, color="green", label="Mode")

plt.legend();

A KDE plot with nothing but the KDE line, the title and mean, median, mode annotated.

This plot verifies what we discussed in the types of skewness section: in a positively skewed distribution, the mean is higher than the median, and the mode is lower than both mean and median.

Conclusion

Skewness and kurtosis, often overlooked in Exploratory Data Analysis, reveal significant insights about the nature of distributions.

Skewness hints at data tilt, whether leaning left or right, revealing its asymmetry (if any). Positive skew means a tail stretching right, while negative skew veers in the opposite direction.

Kurtosis is all about peaks and tails. High kurtosis sharpens peaks and weighs down the tails, while low kurtosis spreads data, lightening the tails.

If you want to learn more about skewness and kurtosis, you can check out these excellent quantitative analysis courses taught by industry experts on DataCamp:


Photo of Bex Tuychiev
Author
Bex Tuychiev

I am a data science content creator with over 2 years of experience and one of the largest followings on Medium. I like to write detailed articles on AI and ML with a bit of a sarcastıc style because you've got to do something to make them a bit less dull. I have produced over 130 articles and a DataCamp course to boot, with another one in the makıng. My content has been seen by over 5 million pairs of eyes, 20k of whom became followers on both Medium and LinkedIn. 

Topics

Learn More! 

Course

Introduction to Portfolio Risk Management in Python

4 hr
22.5K
Evaluate portfolio risk and returns, construct market-cap weighted equity portfolios and learn how to forecast and hedge market risk via scenario generation.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

A Data Science Roadmap for 2024

Do you want to start or grow in the field of data science? This data science roadmap helps you understand and get started in the data science landscape.
Mark Graus's photo

Mark Graus

10 min

A Complete Guide to Alteryx Certifications

Advance your career with our Alteryx certification guide. Learn key strategies, tips, and resources to excel in data science.
Matt Crabtree's photo

Matt Crabtree

9 min

Scaling Enterprise Analytics with Libby Duane Adams, Chief Advocacy Officer and Co-Founder of Alteryx

RIchie and Libby explore the differences between analytics and business intelligence, generative AI and its implications in analytics, the role of data quality and governance, Alteryx’s AI platform, data skills as a workplace necessity, and more. 
Richie Cotton's photo

Richie Cotton

43 min

[Radar Recap] Building a Learning Culture for Analytics Functions, with Russell Johnson, Denisse Groenendaal-Lopez and Mark Stern

In the session, Russell Johnson, Chief Data Scientist at Marks & Spencer, Denisse Groenendaal-Lopez, Learning & Development Business Partner at Booking Group, and Mark Stern, VP of Business Intelligence & Analytics at BetMGM will address the importance of fostering a learning environment for driving success with analytics.
Adel Nehme's photo

Adel Nehme

41 min

[Radar Recap] From Data Governance to Data Discoverability: Building Trust in Data Within Your Organization with Esther Munyi, Amy Grace, Stefaan Verhulst and Malarvizhi Veerappan

Esther Munyi, Amy Grace, Stefaan Verhulst and Malarvizhi Veerappan focus on strategies for improving data quality, fostering a culture of trust around data, and balancing robust governance with the need for accessible, high-quality data.
Richie Cotton's photo

Richie Cotton

39 min

[Radar Recap] Scaling Data ROI: Driving Analytics Adoption Within Your Organization with Laura Gent Felker, Omar Khawaja and Tiffany Perkins-Munn

Laura, Omar and Tiffany explore best practices when it comes to scaling analytics adoption within the wider organization
Richie Cotton's photo

Richie Cotton

40 min

See MoreSee More