Start Learning for Free

Join over 1,000,000 other Data Science learners and start one of our interactive tutorials today!

Topic r small

How to Make a Histogram with ggplot2

March 12th, 2015 in R Programming

In a previous blog post you learned how to make histograms with the hist() function. You can also make histograms by using ggplot2, “a plotting system for R, based on the grammar of graphics” that was created by Hadley Wickham. This post will focus on making a Histogram With ggplot2.

Steps

(Want to learn how to do more plots with ggplot2? Try this interactive course on data visualization with gglot2.)

Step One. Check That You Have ggplot2 Installed

First, go to the tab “packages” in RStudio, an IDE to work with R efficiently, search for ggplot2 and mark the checkbox. Alternatively, it could be that you need to install the package. In this case, you stay in the same tab and you click on “Install”. Enter ggplot2, press ENTER and wait one or two minutes for the package to install.

You can also install ggplot2 from the console with the install.packages() function:

install.packages("ggplot2")

To effectively load the ggplot2 package, execute the following command:

library(ggplot2)

Step Two. The Data

Let’s leave the ggplot2 library for what it is for a bit and make sure that you have some dataset to work with: import the necessary file or use one that is built into R. This tutorial will again be working with the chol dataset.

If you’re just tuning in to this tutorial series, you can download the this dataset from here.

You can load in the chol data set by using the url() function embedded into the read.table() function. Next, you can inspect whether the import was successful with functions such as head(), summary() and str():

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIExvYWQgaW4gYGNob2xgIGRhdGEsIHNldCBgaGVhZGVyYCB0byBgVFJVRWBcbmNob2wgPC0gcmVhZC50YWJsZSh1cmwoXCJodHRwOi8vYXNzZXRzLmRhdGFjYW1wLmNvbS9ibG9nX2Fzc2V0cy9jaG9sLnR4dFwiKSwgaGVhZGVyID0gLi4uLilcblxuIyBJbnNwZWN0IGZpcnN0IHJvd3Mgb2YgYGNob2xgIHdpdGggYGhlYWQoKWBcbi4uLi4oLi4uLilcblxuIyBTdW1tYXJ5IG9mIGBjaG9sYCB3aXRoIGBzdW1tYXJ5KClgXG4uLi4uLi4uKC4uLi4pXG5cbiMgU3RydWN0dXJlIG9mIGBjaG9sYCB3aXRoIGBzdHIoKWBcbi4uLiguLi4uKSIsInNvbHV0aW9uIjoiIyBMb2FkIGluIGBjaG9sYCBkYXRhXG5jaG9sIDwtIHJlYWQudGFibGUodXJsKFwiaHR0cDovL2Fzc2V0cy5kYXRhY2FtcC5jb20vYmxvZ19hc3NldHMvY2hvbC50eHRcIiksIGhlYWRlciA9IFRSVUUpXG5cbiMgSW5zcGVjdCBmaXJzdCByb3dzIG9mIGBjaG9sYCB3aXRoIGBoZWFkKClgXG5oZWFkKGNob2wpXG5cbiMgU3VtbWFyeSB3aXRoIGBzdW1tYXJ5KClgXG5zdW1tYXJ5KGNob2wpXG5cbiMgU3RydWN0dXJlIG9mIGBjaG9sYCB3aXRoIGBzdHIoKWBcbnN0cihjaG9sKSIsInNjdCI6Im1zZ191bmRlZmluZWQgPC0gXCJEaWQgeW91IGxvYWQgaW4gdGhlIGNob2wgZGF0YSBjb3JyZWN0bHk/XCJcbmV4KCkgJT4lIGNoZWNrX29iamVjdCgnY2hvbCcsIG1zZ191bmRlZmluZWQpICU+JSBjaGVja19lcXVhbChtc2dfdW5kZWZpbmVkKVxuZXgoKSAlPiUgY2hlY2tfZnVuY3Rpb24oXCJoZWFkXCIpICU+JSBjaGVja19yZXN1bHQoKSAlPiUgY2hlY2tfZXF1YWwoKVxuZXgoKSAlPiUgY2hlY2tfZnVuY3Rpb24oXCJzdW1tYXJ5XCIpICU+JSBjaGVja19yZXN1bHQoKSAlPiUgY2hlY2tfZXF1YWwoKVxuZXgoKSAlPiUgY2hlY2tfZnVuY3Rpb24oXCJzdHJcIikgJT4lIGNoZWNrX3Jlc3VsdCgpICU+JSBjaGVja19lcXVhbCgpXG5zdWNjZXNzX21zZyhcIkdyZWF0IGpvYiFcIikifQ==

Note that you use the head() function to retrieve the first parts of the choldata.frame, while you use summary() to return a summary of the chol object. Lastly, you can use str() to display the structure of the chol data frame.

Tip: if you want to double check the class of the chol data frame, use the class() function, just like this class(chol).

Step Three. Making Your Histogram With ggplot2

You have two options to make your histograms with the ggplot2 package. On the one hand, you can use the qplot() function, which looks very much like the hist() function:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkoZ2dwbG90MilcbmNob2wgPC0gcmVhZC50YWJsZSh1cmwoXCJodHRwOi8vYXNzZXRzLmRhdGFjYW1wLmNvbS9ibG9nX2Fzc2V0cy9jaG9sLnR4dFwiKSwgaGVhZGVyID0gVFJVRSkiLCJzYW1wbGUiOiIjIENvbXB1dGUgYSBoaXN0b2dyYW0gb2YgYGNob2wkQUdFYFxucXBsb3QoY2hvbCRBR0UsIGdlb209XCJoaXN0b2dyYW1cIikgIn0=
Histograms with ggplot2 in R

You see that it’s easy to use plot with the qplot() function: you pass in the data that you want to have on the x-axis, in this case, chol$AGE, and by adding the geom argument, you can specify the type of graph you want. In this case, by specifying "histogram", you indicate that you want to plot the distribution of chol$AGE.

On the other hand, you can also use the ggplot() function to make the same histogram. In this case, you take the dataset chol and pass it to the data argument. Next, pass the AGE column from the dataset as values on the x-axis and compute a histogram of this:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkoZ2dwbG90MilcbmNob2wgPC0gcmVhZC50YWJsZSh1cmwoXCJodHRwOi8vYXNzZXRzLmRhdGFjYW1wLmNvbS9ibG9nX2Fzc2V0cy9jaG9sLnR4dFwiKSwgaGVhZGVyID0gVFJVRSkiLCJzYW1wbGUiOiIjIENvbXB1dGUgYSBoaXN0b2dyYW0gb2YgYGNob2wkQUdFYFxuZ2dwbG90KGRhdGE9Y2hvbCwgYWVzKGNob2wkQUdFKSkgKyBcbiAgZ2VvbV9oaXN0b2dyYW0oKSJ9
Histograms with ggplot2 in R

As you saw before, ggplot2 is an implementation of the grammar of graphics, which means that there is a basic grammar to producing graphics: you need data and graphical elements to make your plots, just like you need a personal pronouns and a conjugated verb to make sentences. This means that you feed data to a plot as x and y elements and you need to manipulate some details, such as colors, markers, etc. as graphical elements, which are added as layers.

This is exactly what happens in this plot: besides the data argument that you specify, you also add aes to describe how variables in the data (such as chol$AGE) are mapped to visual properties of geoms (geom_histogram() in this case, which is added as a layer).

But what is the difference between these two options?

The qplot() function is supposed to make the same graph as ggplot(), but with a simpler syntax. This might seem quite random, but it really isn’t if you understand where the name qplot() comes from; It’s short for “quick plot” and it’s a shortcut designed to be familiar if you’re used to base plot(). While ggplot() allows for maximum features and flexibility, qplot() is a simpler but less customizable wrapper around ggplot.

Note: in practice, ggplot() is used more often.

Step Four. Taking It One Step Further

Now that you know how to make a basic histogram with this R package that is based on the grammar of graphics, it’s time to take things up a notch, and adjust the qplot() and the ggplot() that you have just made to customize it to your needs.

Adjusting qplot()

The options to adjust your histogram through qplot() are not too extensive, but this function does allow you to adjust the basics to improve the visualization and hence the understanding of the histograms; All you need to do is add some more arguments, just like you did with the hist() function.

You might have already seen the following warning pop up in the previous histograms" stat_bin()` using `bins = 30`. Pick better value with `binwidth`.; The warning refers to the binwidth argument that you can add to the qplot() and ggplot() functions to change the width of the histogram bins.

In any case, you could adjust the original plot to look like this:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkoZ2dwbG90MilcbmNob2wgPC0gcmVhZC50YWJsZSh1cmwoXCJodHRwOi8vYXNzZXRzLmRhdGFjYW1wLmNvbS9ibG9nX2Fzc2V0cy9jaG9sLnR4dFwiKSwgaGVhZGVyID0gVFJVRSkiLCJzYW1wbGUiOiIjIEhpc3RvZ3JhbSBmb3IgYGNob2wkQUdFYFxucXBsb3QoY2hvbCRBR0UsXG4gICAgICBnZW9tPVwiaGlzdG9ncmFtXCIsXG4gICAgICBiaW53aWR0aCA9IDUsICBcbiAgICAgIG1haW4gPSBcIkhpc3RvZ3JhbSBmb3IgQWdlXCIsIFxuICAgICAgeGxhYiA9IFwiQWdlXCIsICBcbiAgICAgIGZpbGw9SShcImJsdWVcIiksIFxuICAgICAgY29sPUkoXCJyZWRcIiksIFxuICAgICAgYWxwaGE9SSguMiksXG4gICAgICB4bGltPWMoMjAsNTApKSJ9
Histograms with ggplot2 in R

Tip: compare the arguments to the ones that are used in the hist() function in the first part of this tutorial series to get some more insight!

You’ll have a histogram for the AGE column in the chol dataset, with title Histogram for Age and label for the x-axis (Age), with bins of a width of 5 that range from values 20 to 50 on the x-axis and that have transparent blue filling and red borders.

Since the R commands are only getting longer and longer, you might need some help to understand what each part of the code does to the histogram’s appearance.

Let’s just break it down to smaller pieces:

Bins

You can change the binwidth by specifying a binwidth argument in your qplot() function. Play around with the binwidth in the DataCamp Light chunk below:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkoZ2dwbG90MilcbmNob2wgPC0gcmVhZC50YWJsZSh1cmwoXCJodHRwOi8vYXNzZXRzLmRhdGFjYW1wLmNvbS9ibG9nX2Fzc2V0cy9jaG9sLnR4dFwiKSwgaGVhZGVyID0gVFJVRSkiLCJzYW1wbGUiOiJxcGxvdCguLi4uLi4uLFxuICAgICAgZ2VvbT0uLi4uLi4uLi4uLiwgXG4gICAgICBiaW53aWR0aD01KSIsInNvbHV0aW9uIjoicXBsb3QoY2hvbCRBR0UsXG4gICAgICBnZW9tPVwiaGlzdG9ncmFtXCIsIFxuICAgICAgYmlud2lkdGg9NSkiLCJzY3QiOiJ0ZXN0X2Vycm9yKClcbnRlc3RfZnVuY3Rpb24oXCJxcGxvdFwiLCBcInhcIiwgZXZhbD1cIkZBTFNFXCIpXG50ZXN0X2Z1bmN0aW9uKFwicXBsb3RcIiwgXCJnZW9tXCIsIGV2YWw9XCJGQUxTRVwiKVxudGVzdF9mdW5jdGlvbihcInFwbG90XCIsIFwiYmlud2lkdGhcIiwgZXZhbD1cIkZBTFNFXCIpXG5zdWNjZXNzX21zZyhcIkF3ZXNvbWUhXCIpIn0=
Histograms with ggplot2 in R

Names/colors

As with the hist() function, you can use the argument main to change the title of the histogram:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkoZ2dwbG90MilcbmNob2wgPC0gcmVhZC50YWJsZSh1cmwoXCJodHRwOi8vYXNzZXRzLmRhdGFjYW1wLmNvbS9ibG9nX2Fzc2V0cy9jaG9sLnR4dFwiKSwgaGVhZGVyID0gVFJVRSkiLCJzYW1wbGUiOiJxcGxvdChjaG9sJEFHRSxcbiAgICAgIGdlb209XCJoaXN0b2dyYW1cIixcbiAgICAgIGJpbndpZHRoPTUsXG4gICAgICBtYWluPS4uLi4uLi4uLi4uLi4uLi4uLi4pIiwic29sdXRpb24iOiJxcGxvdChjaG9sJEFHRSxcbiAgICAgIGdlb209XCJoaXN0b2dyYW1cIixcbiAgICAgIGJpbndpZHRoID0gNSxcbiAgICAgIG1haW49XCJIaXN0b2dyYW0gZm9yIEFnZVwiKSIsInNjdCI6InRlc3RfZXJyb3IoKVxudGVzdF9mdW5jdGlvbihcInFwbG90XCIsIFwieFwiLCBldmFsPVwiRkFMU0VcIilcbnRlc3RfZnVuY3Rpb24oXCJxcGxvdFwiLCBcImdlb21cIiwgZXZhbD1cIkZBTFNFXCIpXG50ZXN0X2Z1bmN0aW9uKFwicXBsb3RcIiwgXCJiaW53aWR0aFwiLCBldmFsPVwiRkFMU0VcIilcbnRlc3RfZnVuY3Rpb24oXCJxcGxvdFwiLCBcIm1haW5cIiwgZXZhbD1cIkZBTFNFXCIpXG5zdWNjZXNzX21zZyhcIkdvb2QhIE5vdyBob3cgZG8geW91IHRoaW5rIHlvdSBjYW4gY2hhbmdlIHRoZSBsYWJlbHMgb24gdGhlIHgtIGFuZCB5LWF4ZXM/XCIpIn0=
Histograms with ggplot2 in R

To change the labels that refer to the x-and y-axes, use xlab and ylab, just like you do when you use the hist() function.

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkoZ2dwbG90MilcbmNob2wgPC0gcmVhZC50YWJsZSh1cmwoXCJodHRwOi8vYXNzZXRzLmRhdGFjYW1wLmNvbS9ibG9nX2Fzc2V0cy9jaG9sLnR4dFwiKSwgaGVhZGVyID0gVFJVRSkiLCJzYW1wbGUiOiJxcGxvdChjaG9sJEFHRSxcbiAgICAgIGdlb209XCJoaXN0b2dyYW1cIixcbiAgICAgIGJpbndpZHRoPTUsICBcbiAgICAgIG1haW49XCJIaXN0b2dyYW0gZm9yIEFnZVwiLCBcbiAgICAgIHhsYWI9Li4uLi4pIiwic29sdXRpb24iOiJxcGxvdChjaG9sJEFHRSxcbiAgICAgIGdlb209XCJoaXN0b2dyYW1cIixcbiAgICAgIGJpbndpZHRoPTUsICBcbiAgICAgIG1haW49XCJIaXN0b2dyYW0gZm9yIEFnZVwiLCBcbiAgICAgIHhsYWI9XCJBZ2VcIikiLCJzY3QiOiJ0ZXN0X2Vycm9yKClcbnRlc3RfZnVuY3Rpb24oXCJxcGxvdFwiLCBcInhcIiwgZXZhbD1cIkZBTFNFXCIpXG50ZXN0X2Z1bmN0aW9uKFwicXBsb3RcIiwgXCJnZW9tXCIsIGV2YWw9XCJGQUxTRVwiKVxudGVzdF9mdW5jdGlvbihcInFwbG90XCIsIFwiYmlud2lkdGhcIiwgZXZhbD1cIkZBTFNFXCIpXG50ZXN0X2Z1bmN0aW9uKFwicXBsb3RcIiwgXCJtYWluXCIsIGV2YWw9XCJGQUxTRVwiKVxudGVzdF9mdW5jdGlvbihcInFwbG90XCIsIFwieGxhYlwiLCBldmFsPVwiRkFMU0VcIilcbnN1Y2Nlc3NfbXNnKFwiR29vZCFcIikifQ==
Histograms with ggplot2 in R

However, if you want to adjust the colors of your histogram, you have to take a slightly different approach than with the hist() function:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkoZ2dwbG90MilcbmNob2wgPC0gcmVhZC50YWJsZSh1cmwoXCJodHRwOi8vYXNzZXRzLmRhdGFjYW1wLmNvbS9ibG9nX2Fzc2V0cy9jaG9sLnR4dFwiKSwgaGVhZGVyID0gVFJVRSkiLCJzYW1wbGUiOiJxcGxvdChjaG9sJEFHRSxcbiAgICAgIGdlb209XCJoaXN0b2dyYW1cIixcbiAgICAgIGJpbndpZHRoID0gNSwgIFxuICAgICAgbWFpbiA9IFwiSGlzdG9ncmFtIGZvciBBZ2VcIiwgXG4gICAgICB4bGFiID0gXCJBZ2VcIiwgIFxuICAgICAgZmlsbD0uLi4uLi4uLi4pIiwic29sdXRpb24iOiJxcGxvdChjaG9sJEFHRSxcbiAgICAgIGdlb209XCJoaXN0b2dyYW1cIixcbiAgICAgIGJpbndpZHRoID0gNSwgIFxuICAgICAgbWFpbiA9IFwiSGlzdG9ncmFtIGZvciBBZ2VcIiwgXG4gICAgICB4bGFiID0gXCJBZ2VcIiwgIFxuICAgICAgZmlsbD1JKFwiYmx1ZVwiKSkiLCJzY3QiOiJ0ZXN0X2Vycm9yKClcbnRlc3RfZnVuY3Rpb24oXCJxcGxvdFwiLCBcInhcIiwgZXZhbD1cIkZBTFNFXCIpXG50ZXN0X2Z1bmN0aW9uKFwicXBsb3RcIiwgXCJnZW9tXCIsIGV2YWw9XCJGQUxTRVwiKVxudGVzdF9mdW5jdGlvbihcInFwbG90XCIsIFwiYmlud2lkdGhcIiwgZXZhbD1cIkZBTFNFXCIpXG50ZXN0X2Z1bmN0aW9uKFwicXBsb3RcIiwgXCJtYWluXCIsIGV2YWw9XCJGQUxTRVwiKVxudGVzdF9mdW5jdGlvbihcInFwbG90XCIsIFwieGxhYlwiLCBldmFsPVwiRkFMU0VcIilcbnRlc3RfZnVuY3Rpb24oXCJxcGxvdFwiLCBcImZpbGxcIiwgZXZhbD1cIkZBTFNFXCIpXG5zdWNjZXNzX21zZyhcIkdyZWF0IGpvYiFcIikifQ==
Histograms with ggplot2 in R

This different approach also counts if you want to change the border of the bins; You add the col argument, with the I() function in which you can nest a color:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkoZ2dwbG90MilcbmNob2wgPC0gcmVhZC50YWJsZSh1cmwoXCJodHRwOi8vYXNzZXRzLmRhdGFjYW1wLmNvbS9ibG9nX2Fzc2V0cy9jaG9sLnR4dFwiKSwgaGVhZGVyID0gVFJVRSkiLCJzYW1wbGUiOiJxcGxvdChjaG9sJEFHRSxcbiAgICAgIGdlb209XCJoaXN0b2dyYW1cIixcbiAgICAgIGJpbndpZHRoPTUsICBcbiAgICAgIG1haW49XCJIaXN0b2dyYW0gZm9yIEFnZVwiLCBcbiAgICAgIHhsYWI9XCJBZ2VcIiwgXG4gICAgICBmaWxsPUkoXCJibHVlXCIpLCBcbiAgICAgIGNvbD0uLi4uLi4uLi4pIiwic29sdXRpb24iOiJxcGxvdChjaG9sJEFHRSxcbiAgICAgIGdlb209XCJoaXN0b2dyYW1cIixcbiAgICAgIGJpbndpZHRoPTUsICBcbiAgICAgIG1haW49XCJIaXN0b2dyYW0gZm9yIEFnZVwiLCBcbiAgICAgIHhsYWI9XCJBZ2VcIiwgXG4gICAgICBmaWxsPUkoXCJibHVlXCIpLCBcbiAgICAgIGNvbD1JKFwicmVkXCIpKSIsInNjdCI6InRlc3RfZXJyb3IoKVxudGVzdF9mdW5jdGlvbihcInFwbG90XCIsIFwieFwiLCBldmFsPVwiRkFMU0VcIilcbnRlc3RfZnVuY3Rpb24oXCJxcGxvdFwiLCBcImdlb21cIiwgZXZhbD1cIkZBTFNFXCIpXG50ZXN0X2Z1bmN0aW9uKFwicXBsb3RcIiwgXCJiaW53aWR0aFwiLCBldmFsPVwiRkFMU0VcIilcbnRlc3RfZnVuY3Rpb24oXCJxcGxvdFwiLCBcIm1haW5cIiwgZXZhbD1cIkZBTFNFXCIpXG50ZXN0X2Z1bmN0aW9uKFwicXBsb3RcIiwgXCJ4bGFiXCIsIGV2YWw9XCJGQUxTRVwiKVxudGVzdF9mdW5jdGlvbihcInFwbG90XCIsIFwiZmlsbFwiLCBldmFsPVwiRkFMU0VcIilcbnRlc3RfZnVuY3Rpb24oXCJxcGxvdFwiLCBcImNvbFwiLCBldmFsPVwiRkFMU0VcIilcbnN1Y2Nlc3NfbXNnKFwiQXdlc29tZSEgTm93LCB3aGF0IGRvIHlvdSB0aGluayB0aGUgYEkoKWAgZnVuY3Rpb24gZG9lcz9cIikifQ==
Histograms with ggplot2 in R

The I() function inhibits the interpretation of its arguments. In this case, the col argument is affected. Without it, the qplot() function would print a legend, saying that “col =”red“”, which is definitely not what you want in this case (Muenchen et al. 2010).

Tip: try removing the I() function and see for yourself what happens!

If you want to set the transparency of the bins’ filling, just add the argument alpha, together with a value that is between 0 (fully transparent) and 1 (opaque). In the code below, set alpha to 0.2:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkoZ2dwbG90MilcbmNob2wgPC0gcmVhZC50YWJsZSh1cmwoXCJodHRwOi8vYXNzZXRzLmRhdGFjYW1wLmNvbS9ibG9nX2Fzc2V0cy9jaG9sLnR4dFwiKSwgaGVhZGVyID0gVFJVRSkiLCJzYW1wbGUiOiJxcGxvdChjaG9sJEFHRSxcbiAgICAgIGdlb209XCJoaXN0b2dyYW1cIixcbiAgICAgIGJpbndpZHRoPTUsICBcbiAgICAgIG1haW49XCJIaXN0b2dyYW0gZm9yIEFnZVwiLCBcbiAgICAgIHhsYWI9XCJBZ2VcIiwgXG4gICAgICBmaWxsPUkoXCJibHVlXCIpLCBcbiAgICAgIGNvbD1JKFwicmVkXCIpLCBcbiAgICAgIGFscGhhPS4uLi4uKSIsInNvbHV0aW9uIjoicXBsb3QoY2hvbCRBR0UsXG4gICAgICBnZW9tPVwiaGlzdG9ncmFtXCIsXG4gICAgICBiaW53aWR0aD01LCAgXG4gICAgICBtYWluPVwiSGlzdG9ncmFtIGZvciBBZ2VcIiwgXG4gICAgICB4bGFiPVwiQWdlXCIsIFxuICAgICAgZmlsbD1JKFwiYmx1ZVwiKSwgXG4gICAgICBjb2w9SShcInJlZFwiKSwgXG4gICAgICBhbHBoYT1JKC4yKSkiLCJzY3QiOiJ0ZXN0X2Vycm9yKClcbnRlc3RfZnVuY3Rpb24oXCJxcGxvdFwiLCBcInhcIiwgZXZhbD1cIkZBTFNFXCIpXG50ZXN0X2Z1bmN0aW9uKFwicXBsb3RcIiwgXCJnZW9tXCIsIGV2YWw9XCJGQUxTRVwiKVxudGVzdF9mdW5jdGlvbihcInFwbG90XCIsIFwibWFpblwiLCBldmFsPVwiRkFMU0VcIilcbnRlc3RfZnVuY3Rpb24oXCJxcGxvdFwiLCBcInhsYWJcIiwgZXZhbD1cIkZBTFNFXCIpXG50ZXN0X2Z1bmN0aW9uKFwicXBsb3RcIiwgXCJmaWxsXCIsIGV2YWw9XCJGQUxTRVwiKVxudGVzdF9mdW5jdGlvbihcInFwbG90XCIsIFwiY29sXCIsIGV2YWw9XCJGQUxTRVwiKVxudGVzdF9mdW5jdGlvbihcInFwbG90XCIsIFwiYWxwaGFcIiwgZXZhbD1cIkZBTFNFXCIpXG5zdWNjZXNzX21zZyhcIkF3ZXNvbWUhIERvIHlvdSBub3RpY2UgdGhhdCB0aGUgYEkoKWAgZnVuY3Rpb24gaXMgdXNlZCBoZXJlIGFsc28/XCIpIn0=
Histograms with ggplot2 in R

Note that the I() function is used here also! Again, try to leave this function out and see what effect this has on the histogram.

X- and Y-Axes

The qplot() function also allows you to set limits on the values that appear on the x-and y-axes. Just use xlim and ylim, in the same way as it was described for the hist() function in the first part of this tutorial on histograms. After adding the xlim argument and some reasonable paramters, you end up with the histogram from the start of this section:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkoZ2dwbG90MilcbmNob2wgPC0gcmVhZC50YWJsZSh1cmwoXCJodHRwOi8vYXNzZXRzLmRhdGFjYW1wLmNvbS9ibG9nX2Fzc2V0cy9jaG9sLnR4dFwiKSwgaGVhZGVyID0gVFJVRSkiLCJzYW1wbGUiOiJxcGxvdChjaG9sJEFHRSxcbiAgICAgIGdlb209XCJoaXN0b2dyYW1cIixcbiAgICAgIGJpbndpZHRoPTUsICBcbiAgICAgIG1haW49XCJIaXN0b2dyYW0gZm9yIEFnZVwiLCBcbiAgICAgIHhsYWI9XCJBZ2VcIiwgIFxuICAgICAgZmlsbD1JKFwiYmx1ZVwiKSwgXG4gICAgICBjb2w9SShcInJlZFwiKSwgXG4gICAgICBhbHBoYT1JKC4yKSxcbiAgICAgIC4uLi49YygyMCw1MCkpIiwic29sdXRpb24iOiJxcGxvdChjaG9sJEFHRSxcbiAgICAgIGdlb209XCJoaXN0b2dyYW1cIixcbiAgICAgIGJpbndpZHRoPTUsICBcbiAgICAgIG1haW49XCJIaXN0b2dyYW0gZm9yIEFnZVwiLCBcbiAgICAgIHhsYWI9XCJBZ2VcIiwgIFxuICAgICAgZmlsbD1JKFwiYmx1ZVwiKSwgXG4gICAgICBjb2w9SShcInJlZFwiKSwgXG4gICAgICBhbHBoYT1JKC4yKSxcbiAgICAgIHhsaW09YygyMCw1MCkpIiwic2N0IjoidGVzdF9lcnJvcigpXG50ZXN0X2Z1bmN0aW9uKFwicXBsb3RcIiwgXCJ4XCIsIGV2YWw9XCJGQUxTRVwiKVxudGVzdF9mdW5jdGlvbihcInFwbG90XCIsIFwiZ2VvbVwiLCBldmFsPVwiRkFMU0VcIilcbnRlc3RfZnVuY3Rpb24oXCJxcGxvdFwiLCBcImJpbndpZHRoXCIsIGV2YWw9XCJGQUxTRVwiKVxudGVzdF9mdW5jdGlvbihcInFwbG90XCIsIFwibWFpblwiLCBldmFsPVwiRkFMU0VcIilcbnRlc3RfZnVuY3Rpb24oXCJxcGxvdFwiLCBcInhsYWJcIiwgZXZhbD1cIkZBTFNFXCIpXG50ZXN0X2Z1bmN0aW9uKFwicXBsb3RcIiwgXCJmaWxsXCIsIGV2YWw9XCJGQUxTRVwiKVxudGVzdF9mdW5jdGlvbihcInFwbG90XCIsIFwiY29sXCIsIGV2YWw9XCJGQUxTRVwiKVxudGVzdF9mdW5jdGlvbihcInFwbG90XCIsIFwiYWxwaGFcIiwgZXZhbD1cIkZBTFNFXCIpXG50ZXN0X2Z1bmN0aW9uKFwicXBsb3RcIiwgXCJ4bGltXCIsIGV2YWw9XCJGQUxTRVwiKVxuc3VjY2Vzc19tc2coXCJHcmVhdCBqb2IhXCIpIn0=
Histograms with ggplot2 in R

Tip: do not forget to use the c() function to specify xlim and ylim!

Adjusting ggplot()

Just like the two other options that have been discussed so far, adjusting your histogram through the ggplot() function is also very easy. The general message stays the same: just add more code to the original code that plots your (basic) histogram!

This way, you can customize your basic ggplot!

In the following exercise, you’ll use the chol data again to make a histogram. More specifically, you’ll plot the chol$AGE data along the x-axis. After that, you’ll use the geom_histogram() function to tell ggplot2 that you’re actually interested in plotting the distribution of chol$AGE with the help of a histogram. Lastly, you customize your ggplot by adding labs(), to which you’ll pass the title, x and y arguments to add labels, and xlim() and ylim() to set the limits of the x- and y-axes.

Try this out in the following interactive exercise:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkoZ2dwbG90MilcbmNob2wgPC0gcmVhZC50YWJsZSh1cmwoXCJodHRwOi8vYXNzZXRzLmRhdGFjYW1wLmNvbS9ibG9nX2Fzc2V0cy9jaG9sLnR4dFwiKSwgaGVhZGVyID0gVFJVRSkiLCJzYW1wbGUiOiJnZ3Bsb3QoZGF0YT1jaG9sLCBhZXMoeD1jaG9sJEFHRSkpICsgXG4gIGdlb21faGlzdG9ncmFtKGJyZWFrcz1zZXEoMjAsIDUwLCBieT0yKSwgXG4gICAgICAgICAgICAgICAgIGNvbD1cInJlZFwiLCBcbiAgICAgICAgICAgICAgICAgZmlsbD1cImdyZWVuXCIsIFxuICAgICAgICAgICAgICAgICBhbHBoYSA9IC4yKSArIFxuICBsYWJzKHRpdGxlPVwiSGlzdG9ncmFtIGZvciBBZ2VcIiwgeD1cIkFnZVwiLCB5PVwiQ291bnRcIikgKyBcbiAgeGxpbShjKDE4LDUyKSkgKyBcbiAgeWxpbShjKDAsMzApKSJ9
Histograms with ggplot2 in R

Again, let’s break this huge chunk of code into pieces to see exactly what each part contributes to the visualization of your histogram:

Bins

To adjust the bin width and the breakpoints, you can basically follow the general guidelines that were provided in the first part of the tutorial on histograms, since the arguments work alike. This means that you can add breaks to change the bin width:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkoZ2dwbG90MilcbmNob2wgPC0gcmVhZC50YWJsZSh1cmwoXCJodHRwOi8vYXNzZXRzLmRhdGFjYW1wLmNvbS9ibG9nX2Fzc2V0cy9jaG9sLnR4dFwiKSwgaGVhZGVyID0gVFJVRSkiLCJzYW1wbGUiOiIjIFBsb3QgdGhlIGhpc3RvZ3JhbSBhbmQgc2V0IGBicmVha3NgXG5nZ3Bsb3QoZGF0YT1jaG9sLCBhZXMoY2hvbCRBR0UpKSArIFxuICBnZW9tX2hpc3RvZ3JhbShicmVha3M9Li4uKDIwLCA1MCwgYnk9MikpIiwic29sdXRpb24iOiIjIFBsb3QgdGhlIGhpc3RvZ3JhbSBhbmQgc2V0IGBicmVha3NgXG5nZ3Bsb3QoZGF0YT1jaG9sLCBhZXMoY2hvbCRBR0UpKSArIFxuICBnZW9tX2hpc3RvZ3JhbShicmVha3M9c2VxKDIwLCA1MCwgYnk9MikpIiwic2N0IjoibXNnX21pc3NpbmcgPC0gXCJEaWQgeW91IGFkZCB0aGUgYCtgIG9wZXJhdG9yIGFmdGVyIHlvdXIgYGdncGxvdCgpYCBhbmQgYGdlb21faGlzdG9ncmFtKClgIGNhbGxzP1wiXG5leCgpICU+JSBjaGVja19jb2RlKFwiK1wiLCB0aW1lcyA9IDEsIGZpeGVkID0gVFJVRSwgbWlzc2luZ19tc2cgPSBtc2dfbWlzc2luZylcbnRlc3RfZ2dwbG90KDEsIGNoZWNrPWMoXCJkYXRhXCIsIFwiYWVzXCIpKVxudGVzdF9nZ3Bsb3QoMSwgY2hlY2s9XCJnZW9tX2hpc3RvZ3JhbVwiLCBjaGVja19nZW9tX3BhcmFtcz1cImJyZWFrc1wiKVxuc3VjY2Vzc19tc2coXCJHb29kIGpvYiFcIikifQ==
Histograms with ggplot2 in R

Note that it is possible for the seq() function to explicitly specify the by argument name as the last argument. This can be more informative, but it doesn’t change the resulting histogram!

Remember that you could also express the same constraints on the bins with the c() function, but that this can make your code messy.

Names/Colors

To adjust the colors of your histogram, just add the arguments col and fill, together with the desired color:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkoZ2dwbG90MilcbmNob2wgPC0gcmVhZC50YWJsZSh1cmwoXCJodHRwOi8vYXNzZXRzLmRhdGFjYW1wLmNvbS9ibG9nX2Fzc2V0cy9jaG9sLnR4dFwiKSwgaGVhZGVyID0gVFJVRSkiLCJzYW1wbGUiOiJnZ3Bsb3QoZGF0YT1jaG9sLCBhZXMoY2hvbCRBR0UpKSArIFxuICBnZW9tX2hpc3RvZ3JhbShicmVha3M9c2VxKDIwLCA1MCwgYnk9MiksIFxuICAgICAgICAgICAgICAgICBjb2w9Li4uLi4sIFxuICAgICAgICAgICAgICAgICBmaWxsPVwiZ3JlZW5cIikiLCJzb2x1dGlvbiI6ImdncGxvdChkYXRhPWNob2wsIGFlcyhjaG9sJEFHRSkpICsgXG4gIGdlb21faGlzdG9ncmFtKGJyZWFrcz1zZXEoMjAsIDUwLCBieT0yKSwgXG4gICAgICAgICAgICAgICAgIGNvbD1cInJlZFwiLCBcbiAgICAgICAgICAgICAgICAgZmlsbD1cImdyZWVuXCIpIiwic2N0IjoibXNnX21pc3NpbmcgPC0gXCJEaWQgeW91IGFkZCB0aGUgYCtgIG9wZXJhdG9yIGFmdGVyIHlvdXIgYGdncGxvdCgpYCBhbmQgYGdlb21faGlzdG9ncmFtKClgIGNhbGxzP1wiXG5leCgpICU+JSBjaGVja19jb2RlKFwiK1wiLCB0aW1lcyA9IDEsIGZpeGVkID0gVFJVRSwgbWlzc2luZ19tc2cgPSBtc2dfbWlzc2luZylcbnRlc3RfZ2dwbG90KDEsIGNoZWNrPWMoXCJkYXRhXCIsIFwiYWVzXCIpKVxudGVzdF9nZ3Bsb3QoMSwgY2hlY2s9XCJnZW9tX2hpc3RvZ3JhbVwiLCBjaGVja19nZW9tX3BhcmFtcz1jKFwiYnJlYWtzXCIsIFwiY29sXCIsIFwiZmlsbFwiKSlcbnN1Y2Nlc3NfbXNnKFwiQXdlc29tZSFcIikifQ==
Histograms with ggplot2 in R

The alpha argument controls the fill transparency. Remember to pass a value between 0 (transparent) and 1 (opaque):

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkoZ2dwbG90MilcbmNob2wgPC0gcmVhZC50YWJsZSh1cmwoXCJodHRwOi8vYXNzZXRzLmRhdGFjYW1wLmNvbS9ibG9nX2Fzc2V0cy9jaG9sLnR4dFwiKSwgaGVhZGVyID0gVFJVRSkiLCJzYW1wbGUiOiJnZ3Bsb3QoZGF0YT1jaG9sLCBhZXMoY2hvbCRBR0UpKSArIFxuICBnZW9tX2hpc3RvZ3JhbShicmVha3M9c2VxKDIwLCA1MCwgYnk9MiksIFxuICAgICAgICAgICAgICAgICBjb2w9XCJyZWRcIiwgXG4gICAgICAgICAgICAgICAgIGZpbGw9XCJncmVlblwiLCBcbiAgICAgICAgICAgICAgICAgYWxwaGE9LjIpIiwic29sdXRpb24iOiJnZ3Bsb3QoZGF0YT1jaG9sLCBhZXMoY2hvbCRBR0UpKSArIFxuICBnZW9tX2hpc3RvZ3JhbShicmVha3M9c2VxKDIwLCA1MCwgYnk9MiksIFxuICAgICAgICAgICAgICAgICBjb2w9XCJyZWRcIiwgXG4gICAgICAgICAgICAgICAgIGZpbGw9XCJncmVlblwiLCBcbiAgICAgICAgICAgICAgICAgYWxwaGE9LjIpIiwic2N0IjoibXNnX21pc3NpbmcgPC0gXCJEaWQgeW91IGFkZCB0aGUgYCtgIG9wZXJhdG9yIGFmdGVyIHlvdXIgYGdncGxvdCgpYCBhbmQgYGdlb21faGlzdG9ncmFtKClgIGNhbGxzP1wiXG5leCgpICU+JSBjaGVja19jb2RlKFwiK1wiLCB0aW1lcyA9IDEsIGZpeGVkID0gVFJVRSwgbWlzc2luZ19tc2cgPSBtc2dfbWlzc2luZylcbnRlc3RfZ2dwbG90KDEsIGNoZWNrPWMoXCJkYXRhXCIsIFwiYWVzXCIpKVxudGVzdF9nZ3Bsb3QoMSwgY2hlY2s9XCJnZW9tX2hpc3RvZ3JhbVwiLCBjaGVja19nZW9tX3BhcmFtcz1jKFwiYnJlYWtzXCIsIFwiY29sXCIsIFwiZmlsbFwiLCBcImFscGhhXCIpKVxuc3VjY2Vzc19tc2coXCJHcmVhdCBqb2IhXCIpIn0=
Histograms with ggplot2 in R

You can also fill the bins with colors according to the count numbers that are presented in the y-axis by passing ..count.., something that is not possible in the qplot() function:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkoZ2dwbG90MilcbmNob2wgPC0gcmVhZC50YWJsZSh1cmwoXCJodHRwOi8vYXNzZXRzLmRhdGFjYW1wLmNvbS9ibG9nX2Fzc2V0cy9jaG9sLnR4dFwiKSwgaGVhZGVyID0gVFJVRSkiLCJzYW1wbGUiOiJnZ3Bsb3QoZGF0YT1jaG9sLCBhZXMoY2hvbCRBR0UpKSArIFxuICBnZW9tX2hpc3RvZ3JhbShicmVha3M9c2VxKDIwLCA1MCwgYnk9MiksIFxuICAgICAgICAgICAgICAgICBjb2w9XCJyZWRcIiwgXG4gICAgICAgICAgICAgICAgIGFlcyhmaWxsPS4uY291bnQuLikpIn0=

Setting the fill argument of aes() within geom_histogram() to ..count.. results in a variety of blue colors, which is actually the default color scheme. If you want to change this, you should add something more to your code: the scale_fill_gradient(), which allows you to specify, for example:

  • that you’re taking the count values from the y-axis,
  • that the low values should be in green and
  • that the higher values should appear in red:
eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkoZ2dwbG90MilcbmNob2wgPC0gcmVhZC50YWJsZSh1cmwoXCJodHRwOi8vYXNzZXRzLmRhdGFjYW1wLmNvbS9ibG9nX2Fzc2V0cy9jaG9sLnR4dFwiKSwgaGVhZGVyID0gVFJVRSkiLCJzYW1wbGUiOiJnZ3Bsb3QoZGF0YT1jaG9sLCBhZXMoY2hvbCRBR0UpKSArIFxuICBnZW9tX2hpc3RvZ3JhbShicmVha3M9c2VxKDIwLCA1MCwgYnk9MiksIFxuICAgICAgICAgICAgICAgICBjb2w9XCJyZWRcIiwgXG4gICAgICAgICAgICAgICAgIGFlcyhmaWxsPS4uY291bnQuLikpICtcbiAgc2NhbGVfZmlsbF9ncmFkaWVudChcIkNvdW50XCIsIGxvdz1cImdyZWVuXCIsIGhpZ2g9XCJyZWRcIikifQ==
Histograms with ggplot2 in R

Remember that the ultimate purpose of adjusting your histogram should always be improving the understanding of it; Even though the histograms above look very fancy, they might not be exactly what you need; So always keep in mind what you’re trying to achieve!

Note that there are several more options to adjust the color of your histograms. If you want to experiment some more, you can find other arguments in the “Scales” section of the ggplot documentation page.

To adjust the title of your histogram, add the argument title:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkoZ2dwbG90MilcbmNob2wgPC0gcmVhZC50YWJsZSh1cmwoXCJodHRwOi8vYXNzZXRzLmRhdGFjYW1wLmNvbS9ibG9nX2Fzc2V0cy9jaG9sLnR4dFwiKSwgaGVhZGVyID0gVFJVRSkiLCJzYW1wbGUiOiJnZ3Bsb3QoZGF0YT1jaG9sLCBhZXMoY2hvbCRBR0UpKSArIFxuICBnZW9tX2hpc3RvZ3JhbShicmVha3M9c2VxKDIwLCA1MCwgYnk9MiksIFxuICAgICAgICAgICAgICAgICBjb2w9XCJyZWRcIiwgXG4gICAgICAgICAgICAgICAgIGZpbGw9XCJncmVlblwiLCBcbiAgICAgICAgICAgICAgICAgYWxwaGEgPSAuMikgKyBcbiAgbGFicyh0aXRsZT0uLi4uLi4uLi4uLi4uLi4uLi4pIiwic29sdXRpb24iOiJnZ3Bsb3QoZGF0YT1jaG9sLCBhZXMoY2hvbCRBR0UpKSArIFxuICBnZW9tX2hpc3RvZ3JhbShicmVha3M9c2VxKDIwLCA1MCwgYnk9MiksIFxuICAgICAgICAgICAgICAgICBjb2w9XCJyZWRcIiwgXG4gICAgICAgICAgICAgICAgIGZpbGw9XCJncmVlblwiLCBcbiAgICAgICAgICAgICAgICAgYWxwaGEgPSAuMikgKyBcbiAgbGFicyh0aXRsZT1cIkhpc3RvZ3JhbSBmb3IgQWdlXCIpIiwic2N0IjoibXNnX21pc3NpbmcgPC0gXCJEaWQgeW91IGFkZCB0aGUgYCtgIG9wZXJhdG9yIGFmdGVyIHlvdXIgYGdncGxvdCgpYCBhbmQgYGdlb21faGlzdG9ncmFtKClgIGNhbGxzP1wiXG5leCgpICU+JSBjaGVja19jb2RlKFwiK1wiLCB0aW1lcyA9IDIsIGZpeGVkID0gVFJVRSwgbWlzc2luZ19tc2cgPSBtc2dfbWlzc2luZylcbnRlc3RfZ2dwbG90KDEsIGNoZWNrPWMoXCJkYXRhXCIsIFwiYWVzXCIpKVxudGVzdF9nZ3Bsb3QoMSwgY2hlY2s9XCJnZW9tX2hpc3RvZ3JhbVwiLCBjaGVja19nZW9tX3BhcmFtcz1jKFwiYnJlYWtzXCIsIFwiY29sXCIsIFwiZmlsbFwiLCBcImFscGhhXCIpKVxudGVzdF9nZ3Bsb3QoMSwgY2hlY2s9XCJsYWJzXCIpXG5zdWNjZXNzX21zZyhcIkFscmlnaHQhIE5vdyBsZXQncyBzZWUgd2hhdCB5b3Ugd291bGQgbmVlZCB0byBkbyB0byBhZGp1c3QgdGhlIGxhYmVscyBvbiB0aGUgeC1hbmQgeS1heGVzLlwiKSJ9
Histograms with ggplot2 in R

To adjust the labels on the x-and y-axes of your histogram, add the arguments x and y, followed by a string of your choice:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkoZ2dwbG90MilcbmNob2wgPC0gcmVhZC50YWJsZSh1cmwoXCJodHRwOi8vYXNzZXRzLmRhdGFjYW1wLmNvbS9ibG9nX2Fzc2V0cy9jaG9sLnR4dFwiKSwgaGVhZGVyID0gVFJVRSkiLCJzYW1wbGUiOiJnZ3Bsb3QoZGF0YT1jaG9sLCBhZXMoLi4uLi4sLi4pKSArIFxuICBnZW9tX2hpc3RvZ3JhbShicmVha3M9c2VxKDIwLCA1MCwgYnkgPSAyKSwgXG4gICAgICAgICAgICAgICAgIGNvbD1cInJlZFwiLCBcbiAgICAgICAgICAgICAgICAgZmlsbD1cImdyZWVuXCIsIFxuICAgICAgICAgICAgICAgICBhbHBoYSA9IC4yKSArIFxuICBsYWJzKHRpdGxlPVwiSGlzdG9ncmFtIGZvciBBZ2VcIiwgeD1cIkFnZVwiLCB5PVwiQ291bnRcIikiLCJzb2x1dGlvbiI6ImdncGxvdChkYXRhPWNob2wsIGFlcyhjaG9sJEFHRSkpICsgXG4gIGdlb21faGlzdG9ncmFtKGJyZWFrcz1zZXEoMjAsIDUwLCBieSA9IDIpLCBcbiAgICAgICAgICAgICAgICAgY29sPVwicmVkXCIsIFxuICAgICAgICAgICAgICAgICBmaWxsPVwiZ3JlZW5cIiwgXG4gICAgICAgICAgICAgICAgIGFscGhhID0gLjIpICsgXG4gIGxhYnModGl0bGU9XCJIaXN0b2dyYW0gZm9yIEFnZVwiLCB4PVwiQWdlXCIsIHk9XCJDb3VudFwiKSIsInNjdCI6Im1zZ19taXNzaW5nIDwtIFwiRGlkIHlvdSBhZGQgdGhlIGArYCBvcGVyYXRvciBhZnRlciB5b3VyIGBnZ3Bsb3QoKWAgYW5kIGBnZW9tX2hpc3RvZ3JhbSgpYCBjYWxscz9cIlxuZXgoKSAlPiUgY2hlY2tfY29kZShcIitcIiwgdGltZXMgPSAyLCBmaXhlZCA9IFRSVUUsIG1pc3NpbmdfbXNnID0gbXNnX21pc3NpbmcpXG50ZXN0X2dncGxvdCgxLCBjaGVjaz1jKFwiZGF0YVwiLCBcImFlc1wiKSlcbnRlc3RfZ2dwbG90KDEsIGNoZWNrPVwiZ2VvbV9oaXN0b2dyYW1cIiwgY2hlY2tfZ2VvbV9wYXJhbXM9YyhcImJyZWFrc1wiLCBcImNvbFwiLCBcImZpbGxcIiwgXCJhbHBoYVwiKSlcbnRlc3RfZ2dwbG90KDEsIGNoZWNrPVwibGFic1wiKVxuc3VjY2Vzc19tc2coXCJHcmVhdCBqb2IhXCIpIn0=
Histograms with ggplot2 in R

X- And Y-Axes

Similar to the arguments that the hist() function uses to adjust the x-and y-axes, you can use the xlim() and ylim(). If you add these two functions, you end up with the histogram from the start of this section:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkoZ2dwbG90MilcbmNob2wgPC0gcmVhZC50YWJsZSh1cmwoXCJodHRwOi8vYXNzZXRzLmRhdGFjYW1wLmNvbS9ibG9nX2Fzc2V0cy9jaG9sLnR4dFwiKSwgaGVhZGVyID0gVFJVRSkiLCJzYW1wbGUiOiJnZ3Bsb3QoZGF0YT0uLi4uLCBhZXMoLi4uLi4uLi4pKSArIFxuICAuLi4uLi4uLi4uLi4uLihicmVha3M9c2VxKDIwLCA1MCwgYnkgPSAyKSwgXG4gICAgICAgICAgICAgICAgIGNvbD1cInJlZFwiLCBcbiAgICAgICAgICAgICAgICAgZmlsbD1cImdyZWVuXCIsIFxuICAgICAgICAgICAgICAgICBhbHBoYSA9IC4yKSArIFxuICAuLi4uKHRpdGxlPVwiSGlzdG9ncmFtIGZvciBBZ2VcIiwgeD1cIkFnZVwiLCB5PVwiQ291bnRcIikgKyBcbiAgeGxpbShjKDE4LDUyKSkgK1xuICB5bGltKGMoMCwzMCkpIiwic29sdXRpb24iOiJnZ3Bsb3QoZGF0YT1jaG9sLCBhZXMoY2hvbCRBR0UpKSArIFxuICBnZW9tX2hpc3RvZ3JhbShicmVha3M9c2VxKDIwLCA1MCwgYnkgPSAyKSwgXG4gICAgICAgICAgICAgICAgIGNvbD1cInJlZFwiLCBcbiAgICAgICAgICAgICAgICAgZmlsbD1cImdyZWVuXCIsIFxuICAgICAgICAgICAgICAgICBhbHBoYSA9IC4yKSArIFxuICBsYWJzKHRpdGxlPVwiSGlzdG9ncmFtIGZvciBBZ2VcIiwgeD1cIkFnZVwiLCB5PVwiQ291bnRcIikgKyBcbiAgeGxpbShjKDE4LDUyKSkgK1xuICB5bGltKGMoMCwzMCkpIiwic2N0IjoibXNnX21pc3NpbmcgPC0gXCJEaWQgeW91IGFkZCB0aGUgYCtgIG9wZXJhdG9yIGFmdGVyIHlvdXIgYGdncGxvdCgpYCBhbmQgYGdlb21faGlzdG9ncmFtKClgIGNhbGxzP1wiXG5leCgpICU+JSBjaGVja19jb2RlKFwiK1wiLCB0aW1lcyA9IDQsIGZpeGVkID0gVFJVRSwgbWlzc2luZ19tc2cgPSBtc2dfbWlzc2luZylcbnRlc3RfZ2dwbG90KDEsIGNoZWNrPWMoXCJkYXRhXCIsIFwiYWVzXCIpKVxudGVzdF9nZ3Bsb3QoMSwgY2hlY2s9XCJnZW9tX2hpc3RvZ3JhbVwiLCBjaGVja19nZW9tX3BhcmFtcz1jKFwiYnJlYWtzXCIsIFwiY29sXCIsIFwiZmlsbFwiLCBcImFscGhhXCIpKVxudGVzdF9nZ3Bsb3QoMSwgY2hlY2s9XCJsYWJzXCIpXG50ZXN0X2dncGxvdCgxLCBjaGVjaz1cInhsaW1cIilcbnRlc3RfZ2dwbG90KDEsIGNoZWNrPVwieWxpbVwiKVxuc3VjY2Vzc19tc2coXCJGYW50YXN0aWMhXCIpIn0=
Histograms with ggplot2 in R

Tip: do not forget to use the c() function when you use the arguments xlim and ylim! And you should probably watch out for those parentheses, too :)

Extra: Trendline

You can easily add a trendline to your histogram by adding geom_density to your code:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkoZ2dwbG90MilcbmNob2wgPC0gcmVhZC50YWJsZSh1cmwoXCJodHRwOi8vYXNzZXRzLmRhdGFjYW1wLmNvbS9ibG9nX2Fzc2V0cy9jaG9sLnR4dFwiKSwgaGVhZGVyID0gVFJVRSkiLCJzYW1wbGUiOiJnZ3Bsb3QoZGF0YT1jaG9sLCBhZXMoY2hvbCRBR0UpKSArIFxuICBnZW9tX2hpc3RvZ3JhbShhZXMoeSA9Li5kZW5zaXR5Li4pLCBcbiAgICAgICAgICAgICAgICAgYnJlYWtzPXNlcSgyMCwgNTAsIGJ5ID0gMiksIFxuICAgICAgICAgICAgICAgICBjb2w9XCJyZWRcIiwgXG4gICAgICAgICAgICAgICAgIGZpbGw9XCJncmVlblwiLCBcbiAgICAgICAgICAgICAgICAgYWxwaGE9LjIpICsgXG4gIGdlb21fZGVuc2l0eShjb2w9MikgKyBcbiAgbGFicyh0aXRsZT0uLi4uLi4uLi4uLi4uLi4uLi4uLCB4PS4uLi4uLCB5PS4uLi4uLi4pIiwic29sdXRpb24iOiJnZ3Bsb3QoZGF0YT1jaG9sLCBhZXMoY2hvbCRBR0UpKSArIFxuICBnZW9tX2hpc3RvZ3JhbShhZXMoeSA9Li5kZW5zaXR5Li4pLCBcbiAgICAgICAgICAgICAgICAgYnJlYWtzPXNlcSgyMCwgNTAsIGJ5ID0gMiksIFxuICAgICAgICAgICAgICAgICBjb2w9XCJyZWRcIiwgXG4gICAgICAgICAgICAgICAgIGZpbGw9XCJncmVlblwiLCBcbiAgICAgICAgICAgICAgICAgYWxwaGE9LjIpICsgXG4gIGdlb21fZGVuc2l0eShjb2w9MikgKyBcbiAgbGFicyh0aXRsZT1cIkhpc3RvZ3JhbSBmb3IgQWdlXCIsIHg9XCJBZ2VcIiwgeT1cIkNvdW50XCIpIiwic2N0IjoibXNnX21pc3NpbmcgPC0gXCJEaWQgeW91IGFkZCB0aGUgYCtgIG9wZXJhdG9yIGFmdGVyIHlvdXIgYGdncGxvdCgpYCwgYGdlb21faGlzdG9ncmFtKClgLCBgZ2VvbV9kZW5zaXR5KClgIGFuZCBgbGFicygpYCBjYWxscz9cIlxuZXgoKSAlPiUgY2hlY2tfY29kZShcIitcIiwgdGltZXMgPSAzLCBmaXhlZCA9IFRSVUUsIG1pc3NpbmdfbXNnID0gbXNnX21pc3NpbmcpXG50ZXN0X2dncGxvdCgxLCBjaGVjaz1jKFwiZGF0YVwiLCBcImFlc1wiKSlcbnRlc3RfZ2dwbG90KDEsIGNoZWNrPVwiZ2VvbV9oaXN0b2dyYW1cIiwgY2hlY2tfZ2VvbV9wYXJhbXM9YyhcImFlc1wiLCBcImJyZWFrc1wiLCBcImNvbFwiLCBcImZpbGxcIiwgXCJhbHBoYVwiKSlcbnRlc3RfZ2dwbG90KDEsIGNoZWNrPVwiZ2VvbV9kZW5zaXR5XCIpXG50ZXN0X2dncGxvdCgxLCBjaGVjaz1cImxhYnNcIilcbnN1Y2Nlc3NfbXNnKFwiRmFudGFzdGljIVwiKSJ9
Histograms with ggplot2 in R

Remember: just like with the hist() function, your histograms with ggplot2 also need to plot the density for this to work. Remember also that the hist() function required you to make a trendline by entering two separate commands while ggplot2 allows you to do it all in one single command.

Step Five. Feeling Like Going Far And Beyond?

If you’re intrigued by the histograms that you can make with ggplot2, and if you want to discover what more you can do with this package, you can read about it on the RDocumentation page. It is a great starting point for anybody that is interested in taking ggplot2 to the next level.

If you already have some understanding of SAS, SPSS and STATA and you want to discover more about ggplot2 but also other useful R packages, you might want to check out DataCamp’s course “R for SAS, SPSS and STATA Users”. The course is taught by Bob Muenchen, who is considered one of the prominent figures in the R community and whose book has briefly been mentioned in this tutorial.

This is the second of 3 posts on creating histograms with R. The next post will cover the creation of histograms using ggvis. Spotted a mistake? Send us a tweet!

Comments

No comments yet. Be the first to respond!