Skip to main content

Data Demystified: Data Visualizations that Capture Distributions

In part 10 of data demystified, we’ll dive deep into the world of data visualization, continuing with visualizations that capture distributions.
Sep 2022  · 8 min read

Welcome to part ten of our month-long data demystified series. As part of Data Literacy Month, this series will clarify key concepts from the world of data, answer the questions you may be too afraid to ask and have fun along the way. If you want to start at the beginning, read our first entry in the series: What is a Dataset?

Data Demystified: Data Visualizations that Capture Distributions banner

This week, we’ll cover common data visualizations and how to interpret them. Data visualization is often called the gateway drug into data science; this blog post will look at data visualizations that capture distributions and how to interpret them.

Visualizations that Capture Distribution

A key use case of data visualization is capturing the distribution of a variable. Capturing distributions allows you to understand critical statistical properties of the data you’re visualizing and help audiences make educated data-driven decisions on key outcomes. Before diving in, here are some key pointers to keep in mind when visualizing distributions:

  1. What is the shape of the distribution? Is the distribution symmetrical or otherwise?
  2. What is the distribution’s spread? As in, the difference between the smallest and the largest value in the dataset.
  3. Is there any outlier in the distribution?
  4. Is there any pattern to the distribution? Is the distribution random, or is there an obvious shape?
  5. What is the average value (mean, mode, median)?

The four visualizations below help us capture these pointers. 

Histograms

A histogram is a graph showing a numerical variable's distribution with bars. It is a convenient way to illustrate the major features of the distribution, especially when the data set is large. Key examples where histograms shine are capturing the salary distribution of employees in a company or the blood sugar levels of a cohort of patients. 

A histogram depicting the age of death for Australian Males in 2022

A histogram depicting the age of death for Australian Males in 2022 (Source: Oosterbaan)

To build a histogram, the numerical data is first divided into several ranges or bins, and the frequency of occurrence of each range is counted. The horizontal axis shows the range, while the vertical axis represents the frequency or percentage of occurrences of a range. Histograms immediately showcase how a variable's distribution is skewed or where it peaks. Here are examples from our Data Visualization for Everyone course. 

symmetric, left-skewed, or right-skewed histograms

A histogram can be symmetric, left-skewed, or right-skewed. (Source: DataCamp)

multiple modes histograms

 A histogram can have multiple modes (Source: DataCamp)

While histograms and bar charts bear resemblances, they serve distinct functions and thus are not to be confused. Here are the key differences.

 

Histogram

Bar chart

Functional difference

To display the distribution of a numerical variable.

To compare values across categories.

Visual difference

There is no space between each bar.

There is usually a space between bars. Also,

Density plots

Just like a histogram, a density plot represents the distribution of a numerical variable. Unlike a histogram, a density plot uses a smooth line instead of bars. The horizontal axis of a density plot is the numerical variable, while the vertical axis is the probability density function. The probability that the variable lies between a range is the area under the graph. 

birth weights mice density plots

The probability that a mouse has a birth weight of between 1.0 to 1.2 grams is the area under the density plot (Source: SPSS)

A density plot can show the distribution shape more effectively than a histogram. A histogram with too small or large of a bin count might hide the actual shape of the underlying distribution. In contrast, a density plot does not require binning and displays smooth distribution curves.

medium bins histogram small bins histogram large bins histograms

The choice of bin count in a histogram is crucial. (Source: Laerd Statistics)

A density plot is also better at comparing multiple distributions than a histogram.

measuring distributions with histogram distributions with density plots

Comparing distributions with density plots vs histograms (Source: Koehrsen Will)

Box plots

Histograms are well-suited for displaying the overall distribution of the data, but box plots are excellent at summarizing a distribution. 

anatomy of a boxplot

The anatomy of a box plot (Source: Galarnyk)

Visualizing data with a box plot reveals the following:

  1. The median: The middle value of a dataset where 50% of the data is less than the median, and 50% of the data is higher than the median. 
  2. The upper quartile: The 75th percentile of a dataset where 75% of the data is less than the upper quartile, and 25% of the data is higher than the upper quartile. 
  3. The lower quartile: The 25th percentile of a dataset where 25% of the data is less than the lower quartile and 75% is higher than the lower quartile. 
  4. The interquartile range: The upper quartile minus the lower quartile
  5. The upper adjacent value: Or colloquially the “maximum”. It represents the upper quartile plus 1.5 times the interquartile range.
  6. The lower adjacent value: Or colloquially the “minimum". It represents the lower quartile minus 1.5 times the interquartile range.
  7. Outliers: Any values above the “maximum” or below the “minimum”.

Violin Plot

A violin plot is a hybrid between a box plot and a density plot. 

distribution of total bills using a violin plot

A violin plot showing the distribution of total bill (Source: DataCamp)

Like in a density plot, a violin plot displays a density distribution. Like in a box plot, a violin plot also shows summary statistics. Violin plots are an effective tool for simultaneously displaying and summarizing the distribution of a numerical variable. 

the anatomy of a violin plot

The anatomy of a violin plot (Source: Hintze and Nelson)

Get Started with Data Visualization Today

We hope you enjoyed this short introduction to data visualization. In the next series entry, we’ll look at how AI is covered in the news and how to grow a healthy skepticism around the latest advancements in the field. To start your data learning journey today, check out the following resources. 

Data Visualization in Spreadsheets

Beginner
4 hr
31.7K
Learn the fundamentals of data visualization using spreadsheets.
See DetailsRight Arrow
Start course
See MoreRight Arrow
Related

The Importance of Data: 5 Top Reasons

Why is data important? Learn about the importance of data in the world today and discover some courses to help you improve your own data skills.
Kurtis Pykes 's photo

Kurtis Pykes

What Does a Data Analyst Do?

Discover what a data analyst is, what they do, and what you need to break into one of the most in-demand careers in data science.
Javier Canales Luna 's photo

Javier Canales Luna

[Infographic] Dashboard Design Checklist

Dashboards are one of the most useful tools when communicating data stories. Here is a handy checklist to keep in mind when designing your next dashboard.
DataCamp Team's photo

DataCamp Team

Best Practices for Building a Data Academy_final.png

[Infographic] 5 Best Practices for Building a Data Academy

With the rising need for data skills, organizations are building internal data academies to accelerate their data transformation. Here are 5 best practices learned from DataCamp for Business customers.
DataCamp Team's photo

DataCamp Team

10 Signs of Bad Data: How to Spot Poor Quality Data

Learn how to spot bad data, exploring why data quality matters, the cost of poor data, and the 10 signs of bad data.
Kurtis Pykes 's photo

Kurtis Pykes

What is Data Maturity and Why Does it Matter?

Discover what data maturity is and why it matters to businesses of all sizes. Plus, find out how to determine your company's data maturity.
Elena Kosourova 's photo

Elena Kosourova

10 min

See MoreSee More