# How to Make a Histogram with Basic R Tutorial

Tutorial for new R users whom need an accessible and easy-to-understand resource on how to create their own histogram with basic R.
Mar 2019  · 10 min read

This is the first post in an R tutorial series that covers the basics of how you can create your own histograms in R. Three options will be explored: basic R commands, ggplot2 and ggvis. These posts are aimed at beginning and intermediate R users who need an accessible and easy-to-understand resource.

Making histogram with basic R commands will be the topic of this post; You will cover the following topics in this tutorial:

## What is a Histogram?

A histogram is a visual representation of the distribution of a dataset. As such, the shape of a histogram is its most evident and informative characteristic: it allows you to easily see where a relatively large amount of the data is situated and where there is very little data to be found (Verzani 2004). In other words, you can see where the middle is in your data distribution, how close the data lie around this middle and where possible outliers are to be found. Because of all this, histograms are a great way to get to know your data!

But what does that specific shape of a histogram exactly look like?

In short, the histogram consists of an x-axis, a y-axis and various bars of different heights. The y-axis shows how frequently the values on the x-axis occur in the data, while the bars group ranges of values or continuous categories on the x-axis. The latter explains why histograms don’t have gaps between the bars.

Note that the bars of histograms are often called “bins” ; This tutorial will also use that name.

## How to Make a Histogram with Basic R

### 1. Show me the Data

Since histograms require some data to be plotted in the first place, you do well importing a dataset or using one that is built into R.

This tutorial makes use of two datasets: the built-in R dataset `AirPassengers` and a dataset named `chol`, stored into a .txt file and available for download.

Before you can start using `chol` in your histograms, you can best read in the text file with the help of the `read.table()` function:

``chol <- read.table(url("http://assets.datacamp.com/blog_assets/chol.txt"), header = TRUE)``

### 2. Familiarize yourself with the `Hist()` Function

You can simply make a histogram by using the `hist()` function, which computes a histogram of the given data values. You put the name of your dataset in between the parentheses of this function, like this:

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJoaXN0KEFpclBhc3NlbmdlcnMpIn0=

Which results in the following histogram:

However, if you want to select only a specific column of a data frame, `chol` for example, to make a histogram, you will have to use the `hist()` function with the dataset name in combination with the `\$` sign, followed by the column name:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImNob2wgPC0gcmVhZC50YWJsZSh1cmwoXCJodHRwOi8vYXNzZXRzLmRhdGFjYW1wLmNvbS9ibG9nX2Fzc2V0cy9jaG9sLnR4dFwiKSwgaGVhZGVyID0gVFJVRSkiLCJzYW1wbGUiOiJoaXN0KGNob2wkQUdFKSAifQ==

Note that the `chol` data has already been loaded in for you!

In this piece of code, you compute a histogram of the data values in the column `AGE` of the dataframe named `chol`. When you execute this line of code, you’ll get the following histogram:

### .css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Visualization Best Practices in R

Beginner
4 hr
13.7K
Learn to effectively convey your data with an overview of common charts, alternative visualization types, and perception-driven style enhancements.
See Details

### Introduction to Data Visualization with ggplot2

Beginner
4 hr
97K
Learn to produce meaningful and beautiful data visualizations with ggplot2 by understanding the grammar of graphics.

### 3. Take the `hist()` Function up a Notch

The histograms of the previous section look a bit dull, don’t they?

The default visualizations usually do not contribute much to the understanding of your histograms. You, therefore, need to take one more step to reach a better and easier understanding of your histograms. Luckily, this is not too hard: R allows for several easy and fast ways to optimize the visualization of diagrams, while still using the `hist()` function.

In order to adapt your histogram, you merely need to add more arguments to the `hist()` function, just like this:

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJoaXN0KEFpclBhc3NlbmdlcnMsIFxuICAgICBtYWluPVwiSGlzdG9ncmFtIGZvciBBaXIgUGFzc2VuZ2Vyc1wiLCBcbiAgICAgeGxhYj1cIlBhc3NlbmdlcnNcIiwgXG4gICAgIGJvcmRlcj1cImJsdWVcIiwgXG4gICAgIGNvbD1cImdyZWVuXCIsXG4gICAgIHhsaW09YygxMDAsNzAwKSxcbiAgICAgbGFzPTEsIFxuICAgICBicmVha3M9NSkifQ==

This code computes a histogram of the data values from the dataset `AirPassengers`, gives it “Histogram for Air Passengers” as title, labels the x-axis as “Passengers”, gives a blue border and a green color to the bins, while limiting the x-axis from `100` to `700`, rotating the values printed on the y-axis by `1` and changing the bin-width to `5`.

Do you feel slightly overwhelmed by this large string of code? No worries!

The following sections will break down the above code chunk into smaller pieces to see what each argument, such as `main`, `col`, …, does.

#### Names/colors

You can change the title of the histogram by adding `main` as an argument to `hist()` function. In this case, you make a histogram of the AirPassengers data set with the title “Histogram for Air Passengers”:

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJoaXN0KEFpclBhc3NlbmdlcnMsIG1haW49XCJIaXN0b2dyYW0gZm9yIEFpciBQYXNzZW5nZXJzXCIpIn0=

If you want to adjust the label of the x-axis, add `xlab`. Similarly, you can also use `ylab` to label the y-axis:

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJoaXN0KEFpclBhc3NlbmdlcnMsIHhsYWI9XCJQYXNzZW5nZXJzXCIsIHlsYWI9XCJGcmVxdWVuY3kgb2YgUGFzc2VuZ2Vyc1wiKSJ9

In the DataCamp Light chunk above, you have made a histogram of the AirPassengers data set with changed labels on the x-and y-axes.

If you want to change the colors of the default histogram, you merely add the arguments `border` or `col`. You can adjust, as the names itself kind of give away, the borders or the colors of your histogram. In the following code chunk, your histogram will have blue-bordered bins with green filling:

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJoaXN0KEFpclBhc3NlbmdlcnMsIGJvcmRlcj1cImJsdWVcIiwgY29sPVwiZ3JlZW5cIikifQ==

Tip: do not forget to put the colors and names in between `""`.

#### X and Y Axes

Change the range of the `x` and `y` values on the axes by adding `xlim` and `ylim` as arguments to the `hist()` function:

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJoaXN0KEFpclBhc3NlbmdlcnMsIHhsaW09YygxMDAsNzAwKSwgeWxpbT1jKDAsMzApKSJ9

In the code chunk above, your histogram has an x-axis that is limited to values `100` to `700`, and the y-axis is limited to values `0` to `30`.

Note that the `c()` function is used to delimit the values on the axes when you are using `xlim` and `ylim`. It takes two values: the first one is the begin value; the second is the end value.

You can rotate the labels on the y-axis by adding `las = 1` as an argument. `las` can take the following values: `0`, `1`, `2` or `3`.

According to whichever option you choose, the placement of the label will differ: if you choose 0, the label will always be parallel to the axis (which is the default); If you choose 1, the label will be put horizontally. Pick 2 if you want it to be perpendicular to the axis and 3 if you want it to be placed vertically.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJoaXN0KEFpclBhc3NlbmdlcnMsIGxhcz0xKSAifQ==

In this case, your histogram has the y-values projected horizontally, because you pass value `1` to the `las` argument. Try changing the amount that you pass to the `las` argument and see the effect!

#### Bins

You can change the bin width by adding `breaks` as an argument, together with the number of breakpoints that you want to have:

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJoaXN0KEFpclBhc3NlbmdlcnMsIGJyZWFrcz01KSAifQ==

The histogram that is the result of the line of code in the DataCamp Light chunk above has 5 breakpoints.

If you want to have more control over the breakpoints between bins, you can enrich the breaks argument by giving it a vector of breakpoints. You can do this by using the `c()` function:

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJoaXN0KEFpclBhc3NlbmdlcnMsIGJyZWFrcz1jKDEwMCwgMzAwLCA1MDAsIDcwMCkpICJ9

In other words, the histogram that is the result of the code above has bins such that they run from `100` to `300`, `300` to `500` and `500` to `700`.

However, the `c()` function can make your code very messy sometimes. That is why you can instead add `seq(x, y, z)`. The values of x, y, and z are determined by yourself and represent, in order of appearance, the beginning number of the x-axis, the end number of the x-axis and the interval in which these numbers appear.

Note that you can also combine the two functions:

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJoaXN0KEFpclBhc3NlbmdlcnMsIGJyZWFrcz1jKDEwMCwgc2VxKDIwMCw3MDAsIDE1MCkpKSJ9

This histogram starts at `100` on the x-axis and at values `200` to `700`, the bins are `150` wide. Take a look at the result of this piece of code by looking at the following image or by executing the DataCamp Light chunk!

Tip: study the changes in the y-axis thoroughly when you experiment with the numbers used in the `seq` argument!

Note that the different width of the bars or bins might confuse people, and the most interesting parts of your data may find themselves to be not highlighted or even hidden when you apply this technique to your original histogram. So, just experiment with this and see what suits your purposes best!

#### Extra: Probability Density

The `hist()` function shows you by default the frequency of a certain bin on the y-axis. However, if you want to see how likely it is that an interval of values of the x-axis occurs, you will need a probability density rather than frequency. You thus want to ask for a histogram of proportions. You can change this by setting the `freq` argument to false or set the `prob` argument to `TRUE`:

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJoaXN0KEFpclBhc3NlbmdlcnMsIFxuICAgICBtYWluPVwiSGlzdG9ncmFtIGZvciBBaXIgUGFzc2VuZ2Vyc1wiLCBcbiAgICAgeGxhYj1cIlBhc3NlbmdlcnNcIiwgXG4gICAgIGJvcmRlcj1cImJsdWVcIiwgXG4gICAgIGNvbD1cImdyZWVuXCIsIFxuICAgICB4bGltPWMoMTAwLDcwMCksIFxuICAgICBsYXM9MSwgXG4gICAgIGJyZWFrcz01LCBcbiAgICAgcHJvYiA9IFRSVUUpIn0=

After you’ve called the `hist()` function to create the above probability density plot, you can subsequently add a density curve to your dataset by using the `lines()` function:

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJoaXN0KEFpclBhc3NlbmdlcnMsIFxuICAgICBtYWluPVwiSGlzdG9ncmFtIGZvciBBaXIgUGFzc2VuZ2Vyc1wiLCBcbiAgICAgeGxhYj1cIlBhc3NlbmdlcnNcIiwgXG4gICAgIGJvcmRlcj1cImJsdWVcIiwgXG4gICAgIGNvbD1cImdyZWVuXCIsIFxuICAgICB4bGltPWMoMTAwLDcwMCksIFxuICAgICBsYXM9MSwgXG4gICAgIGJyZWFrcz01LCBcbiAgICAgcHJvYiA9IFRSVUUpXG5cbmxpbmVzKGRlbnNpdHkoQWlyUGFzc2VuZ2VycykpIn0=

Note that this function requires you to set the `prob` argument of the histogram to `TRUE` first!

## Want to go further?

For an exhaustive list of all the arguments that you can add to the `hist()` function, have a look at the RDocumentation article on the `hist()` function.

This is the first of three posts on creating histograms with R. The next post covers the creation of histograms using ggplot2.

### Introduction to R

Beginner
4 hours
2,395,713
Master the basics of data analysis in R, including vectors, lists, and data frames, and practice R with real data sets.
See Details

### Intermediate R

Beginner
6 hours
532,109
Continue your journey to becoming an R ninja by learning about conditional statements, loops, and vector functions.

### Case Study: Exploratory Data Analysis in R

Beginner
4 hours
48,510
Use data manipulation and visualization skills to explore the historical voting of the United Nations General Assembly.
See all courses
Related

### How to Become a Data Analyst in 2023: 5 Steps to Start Your Career

Learn how to become a data analyst and discover everything you need to know about launching your career, including the skills you need and how to learn them.

Elena Kosourova

18 min

### Predicting FIFA World Cup Qatar 2022 Winners

Learn to use Elo ratings to quantify national soccer team performance, and see how the model can be used to predict the winner of FIFA World Cup Qatar 2022.

Arne Warnke

### Sports Analytics: How Different Sports Use Data Analytics

Discover how sports analytics works and how different sports use data to provide meaningful insights. Plus, discover what it takes to become a sports data analyst.

Kurtis Pykes

13 min

### ggplot2 Cheat Sheet

ggplot2 is considered to be one of the most robust data visualization packages in any programming language. Use this cheat sheet to guide your ggplot2 learning journey.

DataCamp Team

### How to Make a Gantt Chart in Python with Matplotlib

Learn how to make a Gantt chart in Python with matplotlib and why such visualizations are useful.

Elena Kosourova

17 min

### How to Write a Bash Script: A Simple Bash Scripting Tutorial

Discover the basics of bash scripting and learn how to write a bash script.

Kurtis Pykes

5 min

See MoreSee More