Data Demystified: Data Visualizations that Capture Distributions

In part 10 of data demystified, we’ll dive deep into the world of data visualization, continuing with visualizations that capture distributions.

Sep 2, 2022 · 8 min read

Welcome to part ten of our month-long data demystified series. As part of Data Literacy Month, this series will clarify key concepts from the world of data, answer the questions you may be too afraid to ask and have fun along the way. If you want to start at the beginning, read our first entry in the series: What is a Dataset?

This week, we’ll cover common data visualizations and how to interpret them. Data visualization is often called the gateway drug into data science; this blog post will look at data visualizations that capture distributions and how to interpret them.

Visualizations that Capture Distribution

A key use case of data visualization is capturing the distribution of a variable. Capturing distributions allows you to understand critical statistical properties of the data you’re visualizing and help audiences make educated data-driven decisions on key outcomes. Before diving in, here are some key pointers to keep in mind when visualizing distributions:

What is the shape of the distribution? Is the distribution symmetrical or otherwise?
What is the distribution’s spread? As in, the difference between the smallest and the largest value in the dataset.
Is there any outlier in the distribution?
Is there any pattern to the distribution? Is the distribution random, or is there an obvious shape?
What is the average value (mean, mode, median)?

The four visualizations below help us capture these pointers.

Histograms

A histogram is a graph showing a numerical variable's distribution with bars. It is a convenient way to illustrate the major features of the distribution, especially when the data set is large. Key examples where histograms shine are capturing the salary distribution of employees in a company or the blood sugar levels of a cohort of patients.

A histogram depicting the age of death for Australian Males in 2022 (Source: Oosterbaan)

To build a histogram, the numerical data is first divided into several ranges or bins, and the frequency of occurrence of each range is counted. The horizontal axis shows the range, while the vertical axis represents the frequency or percentage of occurrences of a range. Histograms immediately showcase how a variable's distribution is skewed or where it peaks. Here are examples from our Data Visualization for Everyone course.

A histogram can be symmetric, left-skewed, or right-skewed. (Source: DataCamp)

A histogram can have multiple modes (Source: DataCamp)

While histograms and bar charts bear resemblances, they serve distinct functions and thus are not to be confused. Here are the key differences.

	Histogram	Bar chart
Functional difference	To display the distribution of a numerical variable.	To compare values across categories.
Visual difference	There is no space between each bar.	There is usually a space between bars. Also,

Density plots

Just like a histogram, a density plot represents the distribution of a numerical variable. Unlike a histogram, a density plot uses a smooth line instead of bars. The horizontal axis of a density plot is the numerical variable, while the vertical axis is the probability density function. The probability that the variable lies between a range is the area under the graph.

The probability that a mouse has a birth weight of between 1.0 to 1.2 grams is the area under the density plot (Source: SPSS)

A density plot can show the distribution shape more effectively than a histogram. A histogram with too small or large of a bin count might hide the actual shape of the underlying distribution. In contrast, a density plot does not require binning and displays smooth distribution curves.

The choice of bin count in a histogram is crucial. (Source: Laerd Statistics)

A density plot is also better at comparing multiple distributions than a histogram.

Comparing distributions with density plots vs histograms (Source: Koehrsen Will)

Box plots

Histograms are well-suited for displaying the overall distribution of the data, but box plots are excellent at summarizing a distribution.

The anatomy of a box plot (Source: Galarnyk)

Visualizing data with a box plot reveals the following:

The median: The middle value of a dataset where 50% of the data is less than the median, and 50% of the data is higher than the median.
The upper quartile: The 75th percentile of a dataset where 75% of the data is less than the upper quartile, and 25% of the data is higher than the upper quartile.
The lower quartile: The 25th percentile of a dataset where 25% of the data is less than the lower quartile and 75% is higher than the lower quartile.
The interquartile range: The upper quartile minus the lower quartile
The upper adjacent value: Or colloquially the “maximum”. It represents the upper quartile plus 1.5 times the interquartile range.
The lower adjacent value: Or colloquially the “minimum". It represents the lower quartile minus 1.5 times the interquartile range.
Outliers: Any values above the “maximum” or below the “minimum”.

Violin Plot

A violin plot is a hybrid between a box plot and a density plot.

A violin plot showing the distribution of total bill (Source: DataCamp)

Like in a density plot, a violin plot displays a density distribution. Like in a box plot, a violin plot also shows summary statistics. Violin plots are an effective tool for simultaneously displaying and summarizing the distribution of a numerical variable.

The anatomy of a violin plot (Source: Hintze and Nelson)

Get Started with Data Visualization Today

We hope you enjoyed this short introduction to data visualization. In the next series entry, we’ll look at how AI is covered in the news and how to grow a healthy skepticism around the latest advancements in the field. To start your data learning journey today, check out the following resources.

Topics

Data Literacy

Data Visualization Courses

Course

Data Visualization in Google Sheets

4 hr

44.3K

Learn the fundamentals of data visualization using Google Sheets.

See Details

Start Course

Course

Understanding Data Visualization

2 hr

237.4K

An introduction to data visualization with no coding involved.

See Details

Start Course

Course

Data Visualization in Power BI

3 hr

121.8K

Power BI is a powerful data visualization tool that can be used in reports and dashboards.

See Details

Start Course

blog

Data Demystified: Data Visualizations that Capture Trends

In part eight of data demystified, we’ll dive deep into the world of data visualization, starting off with visualizations that capture trends.

Richie Cotton

10 min

blog

Data Demystified: Data Visualizations that Capture Relationships

In part nine of data demystified, we’ll dive deep into the world of data visualization, continuing with visualizations that capture relationships.

Richie Cotton

5 min

blog

Data Demystified: An Overview of Descriptive Statistics

In the fifth entry of data demystified, we provide an overview of the basics of descriptive statistics, one of the fundamental areas of data science.

Richie Cotton

6 min

blog

Data Demystified: The Four Types of Analytics

In the fourth entry of data demystified, we’ll break down the four types of analytics. From a descriptive to prescriptive, we’ll look at how analytics can answer the most pressing questions.

Richie Cotton

5 min

blog

Data Demystified: What Exactly is Data?

Welcome to Data Demystified! A blog-series breaking down key concepts everyone should know about in data. In the first entry of the series, we’ll answer the most basic question of them all, what exactly is data?

Richie Cotton

4 min

cheat-sheet

Data Visualization Cheat Sheet

In this data visualization cheat sheet, you'll learn about the most common data visualizations to employ, when to use them, and their most common use-cases.

Richie Cotton

See More See More

Visualizations that Capture Distribution

Histograms

Density plots

Box plots

Violin Plot

Get Started with Data Visualization Today

Data Demystified: Data Visualizations that Capture Trends

Data Demystified: Data Visualizations that Capture Relationships

Data Demystified: An Overview of Descriptive Statistics

Data Demystified: The Four Types of Analytics

Data Demystified: What Exactly is Data?

Data Visualization Cheat Sheet

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Data Visualization in Google Sheets

Understanding Data Visualization

Data Visualization in Power BI

Data Demystified: Data Visualizations that Capture Trends

Data Demystified: Data Visualizations that Capture Relationships

Data Demystified: An Overview of Descriptive Statistics

Data Demystified: The Four Types of Analytics

Data Demystified: What Exactly is Data?

Data Visualization Cheat Sheet

Data Visualization in Google Sheets