Skip to main content

How to Create a Histogram with Plotly

Learn how to implement histograms in Python using the Plotly data visualization library.
Jan 2023  · 12 min read

Histograms are popular graphical displays of data points organized into specified ranges. We typically see them in statistics, used to display the amount of a specific type of variable that appears within a certain range. For example, your old math teacher may have used a histogram to visualize the number of marks obtained for all the people in your class. 

Histogram displaying the distribution of marks

Figure 1: An example Histogram displaying the distribution of marks for a class. 

[Source: Example Histogram with Matplotlib] 

Their distinctive bars of varying heights resemble the aesthetics of bar charts, except histograms are used to visualize quantitative data instead of categorical data – which is the case for bar charts. 

Generally, numerical data visualized in histograms are continuous, meaning they have infinite values. Attempting to plot the values of continuous variables is often impractical. Histograms mitigate this issue by grouping several data points into logical ranges (known as bins) and allowing them to be visualized. 

In this tutorial, we will cover how to implement histograms in Python using the Plotly data visualization library. We will also touch on different ways to customize your Plotly histogram and why data visualization is important in the first place. 

Check out this DataCamp Workspace to follow along with the code in this tutorial. 

The Importance of Data Visualization in Data Analysis

No business can claim it will not benefit from understanding its data better. The role of data analysis is to help businesses optimize their performance and improve their bottom line; this science spans its wings to all fields where data exists since insight is generally regarded as leverage.   

Data visualizations are the graphical representations analysts use to represent information and data. Its purpose is to summarize data to be presented in an easily understandable format, highlighting vital lessons from the data to its readers. 

In other words, businesses use data visualizations to aid in rapidly identifying data trends which would otherwise be difficult without visual assistance. Graphical representations make it easy to communicate information in a manner that is universal, fast, and effective. 

An Introduction to Plotly

Plotly is an interactive, open-source data visualization library for Python and R written in JavaScript. For this article, we will focus on Plotly Python, occasionally referred to as “plotly.py,” to distinguish it from its JavaScript counterpart. 

Tip: Take the Introduction to data visualization with Plotly in Python or Interactive data visualization with Plotly in R courses to get to grips with Plotly.   

Many users of Ploty often remark they were initially “captivated” by the modern aesthetics the library enables storytellers to implement quite easily. Plotly Python allows users to create attractive, interactive web-based visualizations that can be: 

  • displayed in Jupyter notebooks, 
  • saved to standalone HTML files, 
  • served as part of pure Python-built web applications using Dash.

Another appeal of Plotly is the vast range of chart types available. The library supports more than 40 unique charts spanning various use cases, including statistics, finance, geography, science, and 3-dimensional charts. 

It’s easy to export visualizations created with Plotly due to the library's deep integration with Kaleido – an image export utility. There’s also extensive support for non-web contexts, including desktop editors (i.e., Spyder, PyCharm, etc.) and publishing static documents (i.e., exporting Jupyter notebooks to PDF format). 

Installing plotly.py is reasonably straightforward; It may be installed using pip: 

pip install plotly>=5.11.0, <5.12.0

Or using Conda: 

conda install -c plotly plotly>=5.11.0, <5.12.0

Note: It’s recommended to install libraries in a virtual environment to keep the dependencies for different projects separate. You can also get started with a DataCamp Workspace - a Python & R Data Science IDE in the cloud. 

Using DataCamp Workspaces

The DataCamp Workspace is an online Integrated Development Environment (IDE) that allows users to write code, analyze data individually or collectively, and share data insights. In other words, you can think of DataCamp Workspace as a kind of Google Docs specifically conceived for data science; learn more about how they work in our article on Using DataCamp Workspace in the Classroom.

For this task, we must create a workspace from a dataset. Fortunately, DataCamp provides a rich and growing body of interesting datasets to analyze. When we create a workspace from a dataset, the environment will contain all the pre-configured data and a brief description of the data so you can get your hands dirty from the offset.   

The steps to create a workspace using a DataCamp dataset are as follows: 

  • From the Workspace dashboard, select Use a dataset in the sidebar.
  • Select a dataset you're interested in. A dialog box displays. It provides a preview of the selected dataset, including a data dictionary and a link to the dataset source, a possible choice of language (Python or R), and an overview of publications that other Workspace users created with this dataset.
  • Select Use Dataset.

create a DataCamp Workspace

Figure 2: A GIF demonstrating the steps to create a DataCamp Workspace from a dataset.

Note: View Creating a Plotly Histogram with Olympics Data to follow along with the code.

We will use the Olympics Data dataset from DataCamp’s internal data collection. According to the description, this is a historical dataset on the modern Olympic Games, from Athens 1896 to Rio 2016. Each row consists of an individual athlete (271116 instances) competing in an Olympic event and which medal was won (if any).

The columns consist of the following: 

Column

Explanation

id

Unique number for each athlete

name

Athlete's name

sex

M or F

age

Age of the athlete

height

In centimeters

weight

In kilograms

team

Team name

noc

National Olympic Committee 3

games

Year and season

year

Integer

season

Summer or Winter

city

Host city

sport

Sport

event

Event

medal

Gold, Silver, Bronze, or NA

Given our current understanding, the columns we can expect to build a histogram from are: age, height, weight, and year. This is because histograms are constructed using continuous numerical values - they are used to represent the distribution of numerical data, where the data are binned, and the count for each bin is displayed. 

We will use the age column to create our histogram. 

Creating a Plotly Histogram

Creating a histogram in Python with Plotly is pretty simple; We can use Plotly Express, which is an easy-to-use, high-level interface…

import plotly.express as px

# Create a histogram
fig = px.histogram(olympic_data.age, x="age",
                  title="Distribution of Athletes age")
fig.show()

Histogram created using Plotly ExpressFigure 3: A histogram created using Plotly Express

We also have the option to use Graph Objects.

import plotly.graph_objects as go

fig = go.Figure(data=[go.Histogram(x=olympic_data.age)])

# add the title
fig.update_layout(title=dict(text="Distribution of Athletes age"))
fig.show()

histogram created using Plotly Graph ObjectsFigure 4: A histogram created using Plotly Graph Objects. 

Plotly express functions internally make calls to graph_objects, which returns a value. Therefore, Plotly express functions are useful because they enable users to draw data visualizations that would typically take more lines of code if they were to be drawn using graph_objects

“The plotly.express module (usually imported as px) contains functions that can create entire figures at once and is referred to as Plotly Express or PX. Plotly Express is a built-in part of the Plotly library and is the recommended starting point for creating most common figures. Every Plotly Express function uses graph objects internally and returns a plotly.graph_objects.Figure instance.”

Source: Plotly Express in Python

Despite Plotly express being the recommended entry point to the Plotly library, we will use graph_objects to customize the appearance of our histogram as it has more flexibility.  

Customizing the appearance of the Plotly histogram

Currently, our histogram is pretty bland. Let’s spice it up by changing the fonts we use for our title and adding axis labels. 

# Create histogram
fig = go.Figure(data=[go.Histogram(x=olympic_data.age)])

fig.update_layout(
    # Set the global font
    font = {
        "family":"Times new Roman",
        "size":16
    },
    # Update title font
    title = {
        "text": "Distribution of Athletes age",
        "y": 0.9, # Sets the y position with respect to `yref`
        "x": 0.5, # Sets the x position of title with respect to `xref`
        "xanchor":"center", # Sets the title's horizontal alignment with respect to its x position
        "yanchor": "top", # Sets the title's vertical alignment with respect to its y position. "      
        "font": { # Only configures font for title
            "family":"Arial",
            "size":20,
            "color": "red"
        }
    }
)

# Add X and Y labels
fig.update_xaxes(title_text="Age")
fig.update_yaxes(title_text="Number of Athletes")

# Display plot
fig.show()

Plotly histogram with an updated titleFigure 5: Plotly histogram with an updated title and axis labels added in. 

Our histogram is more informative with these additions. 

Next, let’s change the size of each bin. 

# Create histogram
fig = go.Figure(data = [
    go.Histogram(
        x = olympic_data.age,
        xbins=go.histogram.XBins(size=5) # Change the bin size to 5
    )
  ]
)

Plotly histogram with bin size equal to 5Figure 6: Plotly histogram with bin size equal to 5. 

In the code above, we set the xbins parameter in the go.Histogram instance to a go.histogram.Xbins instance and defined the bin size we want as a parameter of that instance. Note we could have also performed this step using a dict object with compatible properties. 

Lastly, let’s change the color of the bins to orange to make the visualization stand out a little. 

# Create histogram
fig = go.Figure(data = [
    go.Histogram(
        x = olympic_data.age,
        xbins=go.histogram.XBins(size=5), # Change the bin size
        marker=go.histogram.Marker(color="orange") # Change the color
    )
  ]
)

Plotly histogram with orange binsFigure 7: Plotly histogram with orange bins.

Much better. 

In the code above, we set the marker parameter in the go.Histogram instance to a go.histogram.Marker instance and defined the color we want as a parameter of that instance. 

However, there are many other customizations we can make to our Plotly histogram, such as adding hover text. 

Enabling hover text

Text detailing what each bar represents is displayed whenever you hover over a bar in the current histogram - see Figure 8. 

current hover text

Figure 8: The current hover text displayed when hovering over our histogram. 

This is not very intuitive. When we perform data visualization, we want our plots to be self-explanatory - even down to the hover text. Therefore, let’s update the information displayed when we hover over a bin. 

# Create histogram
fig = go.Figure(data = [
    go.Histogram(
        x = olympic_data.age,
        xbins=go.histogram.XBins(size=5), # Change the bin size
        marker=go.histogram.Marker(color="orange"), # Change the color
        hovertemplate="<br>".join([
            "Age Range: %{x}",
            "Number of Athletes: %{y}"]) # Make hover text more informative
    )
  ]
)

updated hover textFigure 9: The updated hover text information. 

The text shown when we hover over a bin is much more informative, which makes our visualization much easier to interpret. We did this by updating the hovertemplate parameter in our go.Histogram instance. 

Adding filtering options

Earlier in the tutorial, we said we could expect to build a histogram including the age, height, weight, and year features. There’s no need for us to build an individual histogram for each one. 

Instead, we could build a single histogram and add filter buttons that allow you to toggle through which feature to display on the visualization. 

Here’s how it looks in code.

"""
Starter code provided by vestland on StackOverflow
Link: https://bit.ly/3ipqOMo
"""
# continuous features
continuous_vars = ["age", "height", "weight", "year"]

# plotly
fig = go.Figure()

# set up ONE trace
fig.add_trace(
    go.Histogram(x = olympic_data.age,
                xbins=go.histogram.XBins(size=5), # Change the bin size
                marker=go.histogram.Marker(color="orange"), # Change the color
            )
)

buttons = []

# button with one option for each dataframe
for col in continuous_vars:
    buttons.append(dict(method='restyle',
                        label=col,
                        visible=True,
                        args=[{"x":[olympic_data[col]],
                              "type":'histogram', [0]],
                        )
                  )

# some adjustments to the updatemenus
updatemenu = []
your_menu = dict()
updatemenu.append(your_menu)

updatemenu[0]['buttons'] = buttons
updatemenu[0]['direction'] = 'down'
updatemenu[0]['showactive'] = True

# add dropdown menus to the figure
fig.update_layout(showlegend=False, updatemenus=updatemenu)
fig.show()

histogram with filters appliedFigure 10: The histogram with filters applied that enable users to toggle through various features on a single visualization.

And there you have it; be sure to check out the Creating a Plotly Histogram with Olympics Data workspace, and play around with what we’ve built in this tutorial. 

Wrap up 

You now know how to create a histogram in Plotly Python; In this tutorial, we covered:

  • What a histogram is and how it works
  • The importance of data visualization in data analysis
  • How to get started with DataCamp workspaces
  • How to create a histogram in Plotly 
  • Different ways to customize your histogram
  • How to apply filters 

If you’re looking for further resources about creating histograms or using Plotly, check out the links below:

Interactive Data Visualization with plotly in R

Beginner
4 hours
9,105
Learn how to use plotly in R to create interactive data visualizations to enhance your data storytelling.
See DetailsRight Arrow
Start Course

Introduction to Data Visualization with Plotly in Python

Beginner
4 hours
9,163
Create interactive data visualizations in Python using Plotly.

Building Dashboards with Dash and Plotly

Beginner
4 hours
6,283
Learn how to build interactive and insight-rich dashboards with Dash and Plotly.
See all coursesRight Arrow
Related

SQL vs Python: Which Should You Learn?

In this article, we will cover the main features of Python and SQL, their main similarities and differences, and which one you should choose first to start your data science journey.
Javier Canales Luna 's photo

Javier Canales Luna

12 min

Text Data In Python Cheat Sheet

Welcome to our cheat sheet for working with text data in Python! We've compiled a list of the most useful functions and packages for cleaning, processing, and analyzing text data in Python, along with clear examples and explanations, so you'll have everything you need to know about working with text data in Python all in one place.
DataCamp Team's photo

DataCamp Team

4 min

Unit Testing in Python Tutorial

Learn what unit testing is, why its important, and how you can implement it with the help of Python.
Abid Ali Awan's photo

Abid Ali Awan

10 min

Python Sets and Set Theory Tutorial

Learn about Python sets: what they are, how to create them, when to use them, built-in functions and their relationship to set theory operations.
DataCamp Team's photo

DataCamp Team

13 min

Pandas Tutorial: DataFrames in Python

Explore data analysis with Python. Pandas DataFrames make manipulating your data easy, from selecting or replacing columns and indices to reshaping your data.
Karlijn Willems's photo

Karlijn Willems

35 min

Joining DataFrames in pandas Tutorial

In this tutorial, you’ll learn various ways in which multiple DataFrames could be merged in python using Pandas library.
DataCamp Team's photo

DataCamp Team

19 min

See MoreSee More