Skip to main content
HomeBlogData Literacy

Data Demystified: Data Visualizations that Capture Distributions

In part 10 of data demystified, we’ll dive deep into the world of data visualization, continuing with visualizations that capture distributions.
Sep 2022  · 8 min read

Welcome to part ten of our month-long data demystified series. As part of Data Literacy Month, this series will clarify key concepts from the world of data, answer the questions you may be too afraid to ask and have fun along the way. If you want to start at the beginning, read our first entry in the series: What is a Dataset?

Data Demystified: Data Visualizations that Capture Distributions banner

This week, we’ll cover common data visualizations and how to interpret them. Data visualization is often called the gateway drug into data science; this blog post will look at data visualizations that capture distributions and how to interpret them.

Visualizations that Capture Distribution

A key use case of data visualization is capturing the distribution of a variable. Capturing distributions allows you to understand critical statistical properties of the data you’re visualizing and help audiences make educated data-driven decisions on key outcomes. Before diving in, here are some key pointers to keep in mind when visualizing distributions:

  1. What is the shape of the distribution? Is the distribution symmetrical or otherwise?
  2. What is the distribution’s spread? As in, the difference between the smallest and the largest value in the dataset.
  3. Is there any outlier in the distribution?
  4. Is there any pattern to the distribution? Is the distribution random, or is there an obvious shape?
  5. What is the average value (mean, mode, median)?

The four visualizations below help us capture these pointers. 

Histograms

A histogram is a graph showing a numerical variable's distribution with bars. It is a convenient way to illustrate the major features of the distribution, especially when the data set is large. Key examples where histograms shine are capturing the salary distribution of employees in a company or the blood sugar levels of a cohort of patients. 

A histogram depicting the age of death for Australian Males in 2022

A histogram depicting the age of death for Australian Males in 2022 (Source: Oosterbaan)

To build a histogram, the numerical data is first divided into several ranges or bins, and the frequency of occurrence of each range is counted. The horizontal axis shows the range, while the vertical axis represents the frequency or percentage of occurrences of a range. Histograms immediately showcase how a variable's distribution is skewed or where it peaks. Here are examples from our Data Visualization for Everyone course. 

symmetric, left-skewed, or right-skewed histograms

A histogram can be symmetric, left-skewed, or right-skewed. (Source: DataCamp)

multiple modes histograms

 A histogram can have multiple modes (Source: DataCamp)

While histograms and bar charts bear resemblances, they serve distinct functions and thus are not to be confused. Here are the key differences.

 

Histogram

Bar chart

Functional difference

To display the distribution of a numerical variable.

To compare values across categories.

Visual difference

There is no space between each bar.

There is usually a space between bars. Also,

Density plots

Just like a histogram, a density plot represents the distribution of a numerical variable. Unlike a histogram, a density plot uses a smooth line instead of bars. The horizontal axis of a density plot is the numerical variable, while the vertical axis is the probability density function. The probability that the variable lies between a range is the area under the graph. 

birth weights mice density plots

The probability that a mouse has a birth weight of between 1.0 to 1.2 grams is the area under the density plot (Source: SPSS)

A density plot can show the distribution shape more effectively than a histogram. A histogram with too small or large of a bin count might hide the actual shape of the underlying distribution. In contrast, a density plot does not require binning and displays smooth distribution curves.

medium bins histogram small bins histogram large bins histograms

The choice of bin count in a histogram is crucial. (Source: Laerd Statistics)

A density plot is also better at comparing multiple distributions than a histogram.

measuring distributions with histogram distributions with density plots

Comparing distributions with density plots vs histograms (Source: Koehrsen Will)

Box plots

Histograms are well-suited for displaying the overall distribution of the data, but box plots are excellent at summarizing a distribution. 

anatomy of a boxplot

The anatomy of a box plot (Source: Galarnyk)

Visualizing data with a box plot reveals the following:

  1. The median: The middle value of a dataset where 50% of the data is less than the median, and 50% of the data is higher than the median. 
  2. The upper quartile: The 75th percentile of a dataset where 75% of the data is less than the upper quartile, and 25% of the data is higher than the upper quartile. 
  3. The lower quartile: The 25th percentile of a dataset where 25% of the data is less than the lower quartile and 75% is higher than the lower quartile. 
  4. The interquartile range: The upper quartile minus the lower quartile
  5. The upper adjacent value: Or colloquially the “maximum”. It represents the upper quartile plus 1.5 times the interquartile range.
  6. The lower adjacent value: Or colloquially the “minimum". It represents the lower quartile minus 1.5 times the interquartile range.
  7. Outliers: Any values above the “maximum” or below the “minimum”.

Violin Plot

A violin plot is a hybrid between a box plot and a density plot. 

distribution of total bills using a violin plot

A violin plot showing the distribution of total bill (Source: DataCamp)

Like in a density plot, a violin plot displays a density distribution. Like in a box plot, a violin plot also shows summary statistics. Violin plots are an effective tool for simultaneously displaying and summarizing the distribution of a numerical variable. 

the anatomy of a violin plot

The anatomy of a violin plot (Source: Hintze and Nelson)

Get Started with Data Visualization Today

We hope you enjoyed this short introduction to data visualization. In the next series entry, we’ll look at how AI is covered in the news and how to grow a healthy skepticism around the latest advancements in the field. To start your data learning journey today, check out the following resources. 

Topics

Data Visualization Courses

Course

Understanding Data Visualization

2 hr
173.3K
An introduction to data visualization with no coding involved.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

Data Competency Framework: Templates and Key Skills

Discover how to build an effective data competency framework, the data and AI skills you need to include, and templates to help you get started.
Adel Nehme's photo

Adel Nehme

8 min

Digital Upskilling Strategies for Transformative Success

Explore the power of digital upskilling in achieving transformative success and bridging the skills gap for a future-ready workforce.
Adel Nehme's photo

Adel Nehme

7 min

What is Data Fluency? A Complete Guide With Resources

Discover what data fluency is and why it matters. Plus find resources and tips for boosting data fluency at an individual and organizational level.
Matt Crabtree's photo

Matt Crabtree

8 min

Making SMARTER™️ Decisions with Lori Silverman, author of Business Storytelling for Dummies

Richie and Lori cover common problems in business decision-making, connecting decision-making to business processes, the role of data visualization and narrative storytelling, the SMARTER™️ decision-making methodology and much more.
Richie Cotton's photo

Richie Cotton

62 min

How Data Leaders Can Make Data Governance a Priority with Saurabh Gupta, Chief Strategy & Revenue Officer at The Modern Data Company

Adel and Saurabh explore the importance of data quality and how ‘shifting left’ can improve data quality practices, operationalizing ‘shift left’ strategies through collaboration and data governance, future trends in data quality and governance, and more.
Adel Nehme's photo

Adel Nehme

41 min

[Radar Recap] The Art of Data Storytelling: Driving Impact with Analytics with Brent Dykes, Lea Pica and Andy Cotgreave

Brent, Lea and Andy shed light on the art of blending analytics with storytelling, a key to making data-driven insights both understandable and influential within any organization.
Richie Cotton's photo

Richie Cotton

40 min

See MoreSee More