How to Make a Histogram with ggplot2
• March 11, 2019 • 13 min read
In a previous blog post, you learned how to make histograms with the hist()
function. You can also make histograms by using ggplot2
, “a plotting system for R, based on the grammar of graphics” that was created by Hadley Wickham. This post will focus on making a Histogram With ggplot2.

Steps
- Check that you have ggplot2 installed
- The Data
- Making your Histogram with ggplot2
- Taking it one Step Further
- Feeling Like Going Far and Beyond?
(Want to learn how to do more plots with ggplot2? Try this interactive course on data visualization with gglot2.)
Step One. Check that you have ggplot2 Installed
First, go to the tab “packages” in RStudio, an IDE to work with R efficiently, search for ggplot2
and mark the checkbox. Alternatively, it could be that you need to install the package. In this case, you stay in the same tab, and you click on “Install”. Enter ggplot2, press ENTER and wait one or two minutes for the package to install.
You can also install ggplot2
from the console with the install.packages()
function:
install.packages("ggplot2")
To effectively load the ggplot2
package, execute the following command:
library(ggplot2)
Step Two. The Data
Let’s leave the ggplot2
library for what it is for a bit and make sure that you have some dataset to work with: import the necessary file or use one that is built into R. This tutorial will again be working with the chol
dataset.
If you’re just tuning into this tutorial series, you can download the dataset from here.
You can load in the chol
data set by using the url()
function embedded into the read.table()
function. Next, you can inspect whether the import was successful with functions such as head()
, summary()
and str()
:
Note that you use the head()
function to retrieve the first parts of the chol
data.frame
, while you use summary()
to return a summary of the chol
object. Lastly, you can use str()
to display the structure of the chol
data frame.
Tip: if you want to double check the class of the chol
data frame, use the class()
function, just like this class(chol)
.
Step Three. Making your Histogram with ggplot2
You have two options to create your histograms with the ggplot2
package. On the one hand, you can use the qplot()
function, which looks very much like the hist()
function:

You see that it’s easy to use plot with the qplot()
function: you pass in the data that you want to have on the x-axis, in this case, chol$AGE
, and by adding the geom
argument, you can specify the type of graph you want. In this case, by specifying "histogram"
, you indicate that you want to plot the distribution of chol$AGE
.
On the other hand, you can also use the ggplot()
function to make the same histogram. In this case, you take the dataset chol
and pass it to the data
argument. Next, pass the AGE
column from the dataset as values on the x-axis and compute a histogram of this:

As you saw before, ggplot2
is an implementation of the grammar of graphics, which means that there is a basic grammar to producing graphics: you need data and graphical elements to make your plots, just like you need a personal pronoun and a conjugated verb to make sentences. This means that you feed data to a plot as x
and y
elements and you need to manipulate some details, such as colors, markers, etc. as graphical elements, which are added as layers.
This is precisely what happens in this plot: besides the data
argument that you specify, you also add aes
to describe how variables in the data (such as chol$AGE
) are mapped to visual properties of geoms (geom_histogram()
in this case, which is added as a layer).
But what is the difference between these two options?
The qplot()
function is supposed to make the same graph as ggplot()
, but with a simpler syntax. This might seem entirely random, but it really isn’t if you understand where the name qplot()
comes from; It’s short for “quick plot”, and it’s a shortcut designed to be familiar if you’re used to base plot()
. While ggplot()
allows for maximum features and flexibility, qplot()
is a more straightforward but less customizable wrapper around ggplot
.
Note: in practice, ggplot()
is used more often.
Step Four. Taking it one Step Further
Now that you know how to make a basic histogram with this R package that is based on the grammar of graphics, it’s time to take things up a notch, and adjust the qplot()
and the ggplot()
that you have just made to customize it to your needs.
Adjusting qplot()
The options to adjust your histogram through qplot()
are not too extensive, but this function does allow you to change the basics to improve the visualization and hence the understanding of the histograms; All you need to do is add some more arguments, just like you did with the hist()
function.
You might have already seen the following warning pop up in the previous histograms" stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
; The warning refers to the binwidth
argument that you can add to the qplot()
and ggplot()
functions to change the width of the histogram bins.
In any case, you could adjust the original plot to look like this:

Tip: compare the arguments to the ones that are used in the hist()
function in the first part of this tutorial series to get some more insight!
You’ll have a histogram for the AGE
column in the chol
dataset, with title Histogram for Age
and label for the x-axis (Age
), with bins of a width of 5 that range from values 20 to 50 on the x-axis and that have a transparent blue filling and red borders.
Since the R commands are only getting longer and longer, you might need some help to understand what each part of the code does to the histogram’s appearance.
Let’s just break it down to smaller pieces:
Bins
You can change the binwidth by specifying a binwidth
argument in your qplot()
function. Play around with the binwidth in the DataCamp Light chunk below:

Names/colors
As with the hist()
function, you can use the argument main
to change the title of the histogram:

To change the labels that refer to the x-and y-axes, use xlab
and ylab
, just like you do when you use the hist()
function.

However, if you want to adjust the colors of your histogram, you have to take a slightly different approach than with the hist()
function:

This different approach also counts if you want to change the border of the bins; You add the col
argument, with the I()
function in which you can nest a color:

The I()
function inhibits the interpretation of its arguments. In this case, the col
argument is affected. Without it, the qplot()
function would print a legend, saying that “col =”red“”, which is definitely not what you want in this case (Muenchen et al. 2010).
Tip: try removing the I()
function and see for yourself what happens!
If you want to set the transparency of the bins’ filling, just add the argument alpha
, together with a value that is between 0 (fully transparent) and 1 (opaque). In the code below, set alpha
to 0.2
:

Note that the I()
function is used here also! Again, try to leave this function out and see what effect this has on the histogram.
X- and Y-Axes
The qplot()
function also allows you to set limits on the values that appear on the x-and y-axes. Just use xlim
and ylim
, in the same way as it was described for the hist()
function in the first part of this tutorial on histograms. After adding the xlim
argument and some reasonable parameters, you end up with the histogram from the start of this section:

Tip: do not forget to use the c()
function to specify xlim
and ylim
!
Adjusting ggplot()
Just like the two other options that have been discussed so far, adjusting your histogram through the ggplot()
function is also very easy. The general message stays the same: just add more code to the original code that plots your (basic) histogram!
This way, you can customize your basic ggplot!
In the following exercise, you’ll use the chol
data again to make a histogram. More specifically, you’ll plot the chol$AGE
data along the x-axis. After that, you’ll use the geom_histogram()
function to tell ggplot2 that you’re actually interested in plotting the distribution of chol$AGE
with the help of a histogram. Lastly, you customize your ggplot by adding labs()
, to which you’ll pass the title
, x
and y
arguments to add labels, and xlim()
and ylim()
to set the limits of the x- and y-axes.
Try this out in the following interactive exercise:

Again, let’s break this massive chunk of code into pieces to see exactly what each part contributes to the visualization of your histogram:
Bins
To adjust the bin width and the breakpoints, you can basically follow the general guidelines that were provided in the first part of the tutorial on histograms, since the arguments work alike. This means that you can add breaks
to change the bin width:

Note that it is possible for the seq()
function to explicitly specify the by
argument name as the last argument. This can be more informative, but it doesn’t change the resulting histogram!
Remember that you could also express the same constraints on the bins with the c()
function, but that this can make your code messy.
Names/Colors
To adjust the colors of your histogram, just add the arguments col
and fill
, together with the desired color:

The alpha
argument controls the fill transparency. Remember to pass a value between 0 (transparent) and 1 (opaque):

You can also fill the bins with colors according to the count numbers that are presented in the y-axis by passing ..count..
, something that is not possible in the qplot()
function:
Setting the fill
argument of aes()
within geom_histogram()
to ..count..
results in a variety of blue colors, which is actually the default color scheme. If you want to change this, you should add something more to your code: the scale_fill_gradient()
, which allows you to specify, for example:
- that you’re taking the count values from the y-axis,
- that the low values should be in green and
- that the higher values should appear in red:

Remember that the ultimate purpose of adjusting your histogram should always be improving the understanding of it; Even though the histograms above look very fancy, they might not be exactly what you need; So always keep in mind what you’re trying to achieve!
Note that there are several more options to adjust the color of your histograms. If you want to experiment some more, you can find other arguments in the “Scales” section of the ggplot
documentation page.
To adjust the title of your histogram, add the argument title
:

To adjust the labels on the x-and y-axes of your histogram, add the arguments x
and y
, followed by a string of your choice:

X- and Y-Axes
Similar to the arguments that the hist()
function uses to adjust the x-and y-axes, you can use the xlim()
and ylim()
. If you add these two functions, you end up with the histogram from the start of this section:

Tip: do not forget to use the c()
function when you use the arguments xlim
and ylim
! And you should probably watch out for those parentheses, too :)
Extra: Trendline
You can easily add a trendline to your histogram by adding geom_density
to your code:

Remember: just like with the hist()
function, your histograms with ggplot2
also need to plot the density for this to work. Remember also that the hist()
function required you to make a trendline by entering two separate commands while ggplot2
allows you to do it all in one single command.
Step Five. Feeling Like Going Far and Beyond?
If you’re intrigued by the histograms that you can make with ggplot2
, and if you want to discover what more you can do with this package, you can read about it on the RDocumentation page. It is a great starting point for anybody that is interested in taking ggplot2
to the next level.
If you already have some understanding of SAS, SPSS and STATA, and you want to discover more about ggplot2
but also other useful R packages. You might want to check out DataCamp’s course “R for SAS, SPSS and STATA Users”. The course is taught by Bob Muenchen, who is considered one of the prominent figures in the R community and whose book has briefly been mentioned in this tutorial.
This is the second of 3 posts on creating histograms with R. The next post will cover the creation of histograms using ggvis.
← Back to tutorial