HomeTutorialsR Programming

# How to Make a Histogram in Base R

Discover how to create a histogram with Base R using our comprehensive 6-step tutorial. Customize your plots and visualize data distributions effectively.
Updated Jul 2024  · 10 min read

In this tutorial, we will be visualizing distributions of data by plotting histograms using the R programming language. We will cover what a histogram is, how to read data in R, how to create a histogram, and how to customize the plot.

We will be using the base R programming language with no additional packages. This approach is especially useful when additional packages cannot be used or when you are looking for quick exploratory analyses. In other cases, you might consider using `ggplot2`, as covered in our How to Make a ggplot2 Histogram in R tutorial.

To easily run all the example code in this tutorial yourself, you can create a DataLab workbook for free that has R pre-installed and contains all code samples. For more practice on how to make a histogram in R, check out this hands-on DataCamp exercise.

## What Is a Histogram?

A histogram is a very popular graph that is used to show frequency distributions across continuous (numeric) variables. Histograms allow us to see the count of observations in data within ranges that the variable spans.

Histograms look similar to bar charts. A key difference between the two is that bar charts have a value associated with a specific category or discrete variable, while a histogram visualizes frequencies for continuous variables.

## Setting Up Data for Histograms

We will be using this housing dataset which includes details about different house listings, including the size of the house, the number of rooms, the price, and location information. We can read the data using the `read.csv()` function, either directly from the URL or by downloading the csv file into a directory and reading it from our local storage. We can also specify that we only want to store the columns we are interested in for this tutorial: price and condition.

``home_data <- read.csv("https://raw.githubusercontent.com/rashida048/Datasets/master/home_data.csv")[ ,c('price', 'condition')]``

Let’s look at the first few rows of data using the `head()` function

``head(home_data, 5)``

## Creating Histograms with Base R

Next, we will create a histogram using the `hist()` function to look at the distribution of prices in our dataset.

``hist(home_data\$price)``

Basic histogram of home prices. Image by Author.

### Adding descriptive statistics

We can add descriptive statistics to the histogram using the `abline()` function. This adds a vertical line to the plot.

• Set the `v` argument to the position on the x-axis for the vertical line. Here, we get the mean house price using `mean()`.
• The `col` argument set the line color, in this case to red.
• The `lwd `argument sets the line width. A value of 3 increases the thickness of the line to make it easier to see.
``````hist(home_data\$price)

abline(v = mean(home_data\$price), col='red', lwd = 3)``````

### Plotting probability densities

To add a probability density line to the histogram, we first change the y-axis to be scaled to density. In the call to `hist()` , we set the `probability` argument to `TRUE`.

The probability density line is made with a combination of `density()`, which calculates the position of the probability density curve, and `lines()`, which adds the line to the existing plot.

``````hist(home_data\$price, probability = TRUE)
abline(v = mean(home_data\$price), col='red', lwd = 3)
lines(density(home_data\$price), col = 'green', lwd = 3)``````

Histogram of home prices with the average highlighted. Image by Author.

Notice that the numbers on the y-axis have changed.

### Customizing the color

We can change the colors inside of the bins on the histogram using the `col` parameter of the `hist()` function. We will change the fill to blue. We can also change the outline color of the bars using the border parameter. We will change the color of the outlines to white.

``hist(home_data\$price, col = 'blue', border = "white")``

Histogram of home prices with color added. Image by Author.

### Adding labels and titles

We can change the labels on the plot to make it more readable and presentable. This is useful if you share the plot with others.

• `xlab` sets the x-axis label
• `ylab` sets the y-axis label
• `main` sets the plot title
``hist(home_data\$price, xlab = 'Price (USD)', ylab = 'Number of Listings', main = 'Distribution of House Prices')``

Histogram of home prices with axis labels. Image by Author.

### Binning using breaks

With the default arguments, it is challenging to see the full distribution of the housing prices across the range of prices. We can see they are centralized in the first few bins, but they are not very descriptive.

We can add more bins using the `breaks` parameter. With this argument, we can pass a vector of specific breakpoints to use, a function to compute the breakpoints, a number of breaks we would like, or a function to compute the number of cells.

For this example, we will pass the number of bins we would like. This number is context-specific based on what you are trying to show in your graph.

``hist(home_data\$price, breaks = 100)``

Histogram of home prices with bin width changed. Image by Author.

With `breaks` set to 100, we have significantly more visibility into the distribution in the first few buckets.

We can also specify the number of breaks using the names of common calculations for calculating optimal breaks in a histogram. By default, `hist()` uses the `“Sturges”` method. Here we specify the method explicitly.

``hist(home_data\$price, breaks = "Sturges")``

Histogram of home prices using the Sturges method. Image by Author.

We can also pass `“Scott”` as an argument for the `breaks` attribute to use the Scott Method.

``hist(home_data\$price, breaks = "Scott")``

Histogram of home prices using the Scott method. Image by Author.

Finally, we could also use the Freedman-Diaconis (FD) method.

``hist(home_data\$price, breaks = "Freedman-Diaconis")``

Histogram of home prices using the Freedman-Diaconis method. Image by Author.

### Setting axis limits

We can set the x-axis limits of our plot using the `xlim`  argument to zoom in on the data we are interested in. For example, it is sometimes helpful to focus on the central part of the distribution, rather than over the long tail we currently see when we view the whole plot.

Changing the y-axis limits is also possible (using the `ylim` argument) but this is less useful for histograms since the automatically calculated values are almost always ideal.

We will zoom in on prices between \$0 and \$2M.

``hist(home_data\$price, breaks = 100, xlim = c(0, 2000000))``

Histogram of home prices with the axis limits changed. Image by Author.

## Next Steps in Histogram Visualization

As you get more comfortable with R, you can explore more powerful packages that make it easier to build more interesting and useful visualizations. A very popular and easy-to-use library for plotting in R is called ggplot2. Below we create an interesting view of the distributions of prices based on the number of bedrooms in the house.

Histogram of home prices using ggplot2. Image by Author.

`ggplot2` is the best way to visualize data in R, and you can learn about using it to create histograms in the How to make a histogram in R in `ggplot2` tutorial. Check out our Introduction to ggplot2 course and our Intermediate ggplot2 course to learn how to make more interesting visualizations in R.

## Final Thoughts

In this tutorial, we learned that histograms are great visualizations for looking at distributions of continuous variables. We learned how to make a histogram in R, how to plot summary statistics on top of our histogram, how to customize features of the plot like the axis titles, the color, how we bin the x-axis, and how to set limits on the axes. Finally, we demonstrated some of the power of the `ggplot2` library.

For further DataCamp reading and resources, check out our interactive courses:

## Your Path to Mastering R

Start from scratch and build core R skills for data science.
Topics

Learn R with DataCamp

Course

### .css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Introduction to R

4 hr
2.7M
Master the basics of data analysis in R, including vectors, lists, and data frames, and practice R with real data sets.
See Details
Start Course

Course

### Intermediate R

6 hr
602.7K
Continue your journey to becoming an R ninja by learning about conditional statements, loops, and vector functions.

Course

### Exploratory Data Analysis in R

4 hr
100.2K
Learn how to use graphical and numerical techniques to begin uncovering the structure of your data.
See More
Related

tutorial

### How to Make a ggplot2 Histogram in R

Learn how to make a ggplot2 histogram in R. Make histograms in R based on the grammar of graphics.

Kevin Babitz

15 min

tutorial

### Box Plot in R Tutorial

Learn about box plots in R, including what they are, when you should use them, how to implement them, and how they differ from histograms.

DataCamp Team

4 min

tutorial

### R Formula Tutorial

Discover the R formula and how you can use it in modeling- and graphical functions of well-known packages such as stats, and ggplot2.

Karlijn Willems

33 min

tutorial

### How to Create a Histogram with Plotly

Learn how to implement histograms in Python using the Plotly data visualization library.

Kurtis Pykes

12 min

tutorial

### Bivariate Distribution Heatmaps in R

Learn how to visually show the relationship between two features, how they interact with each other, and where data points are concentrated.

6 min

tutorial

### 15 Questions All R Users Have About Plots

There are different types of R plots, ranging from the basic graph types to complex types of graphs. Here we discover how to create these.

Karlijn Willems

39 min

See MoreSee More