Skip to main content

The 10 Most Important Packages in R for Data Science

Learn about different packages in R used for data science. Including how to load them and different resources you can use to advance your skills with them.
Aug 2020  · 6 min read

R is the most popular language for Data Science. There are many packages and libraries provided for doing different tasks. For example, there is dplyr and data.table for data manipulation, whereas libraries like ggplot2 for data visualization and data cleaning library like tidyr. Also, there is a library like 'Shiny' to create a Web application and knitr for the Report generation where finally mlr3, xgboost, and caret are used in Machine Learning.

1. ggplot2

ggplot2 is based on the 'Grammar of Graphics", which is a popular data visualization library. Graphs with one variable, two variables, and three variables, along with both categorical and numerical data, can be built. Also, grouping can be done through symbol, size, color, etc. The interactive graphics can be made with the help of, where the 3D image should be made from plot3D.

You can easily install the package ggplot2 in R's console as seen below:


You can easily load the package ggplot2 by using the following syntax:


The following tutorials on DataCamp provide much detailed knowledge about 'ggplot2'.

  1. Data Visualization with ggplot2 (Part 1)
  2. Data Visualization with ggplot2 (Part 2)

2. data.table

data.table is the fastest package that can handle a vast amount of data during data manipulation. It is mostly used for health care domains for genomic data and fields like business for predictive analytics. Also, the data size ranges from more than 10 GB to 100GB.

You can easily install the package data.table in R's console as seen below:


You can easily load the package data.table in R as seen below:


You can look up to following tutorial and course in the DataCamp:

  1. Data Manipulation with data.table in R
  2. A data.table R Tutorial: Intro to DT[i, j, by].

3. dplyr

dplyr is the package which is used for data manipulation by providing different sets of verbs like select(), arrange(), filter(), summarise(), and mutate(). It can also work with computational backends like dplyr, sparklyr, and dtplyr.

  1. You can install dplyr through using the tidyverse package, which will come with the package dplyr.

  2. Alternatively, you can install dplyr using the following command.

  3. You can load the package by using the following command.


The following tutorial and course in DataCamp provide detailed knowledge of dplyr.

  1. Data Manipulation with dplyr
  2. Joining Data with dplyr
  3. Introduction to the Tidyverse

4. tidyr

tidyr helps to create tidy data. The significant amount of work mostly goes on when cleaning and tidying the data. Basically, tidy data consists of those datasets where every cell acts as a single value, where every row is an observation, and every column is variable.

You can install tidyr using the following command.


You can load tidyr using the following command.


The following tutorial in DataCamp provides detailed knowledge in tidyr.

Cleaning Data in R

5. Shiny

Shiny can be used to build the web application without requiring JavaScript. It can be used together with htmlwidgets, JavaScript actions, and CSS themes to have extended features. Also, it can be used to build dashboards along with the standalone web applications.

You can install the Shiny package by the following command.


You can load Shiny using the following command.


You can visit the link mentioned below to learn more about Shiny.

Shiny Fundamentals with R

6. plotly

plotly is the graphing library used to create graphs that are interactive and can also be used with JavaScript known as plotly.js.

You can install the plotly package by the following command.


You can load plotly using the following command.


You can visit the link mentioned below to learn more about plotly.

Intermediate Interactive Data Visualization with plotly in R

7. knitr

knitr is the package mostly used for research. It is reproducible, used for report creation, and integrates with various types of code structures like LaTeX, HTML, Markdown, LyX, etc. It was inspired by Sweave and has extended the features by adding lots of packages like a weaver, animation, cacheSweave, etc.

You can install the knitr package by the following command.


You can load knitr using the following command.


You can visit the link mentioned below to learn more about knitr.

Reporting with R Markdown

8. mlr3

mlr3 package is created for doing Machine Learning. It is also efficient, which supports Object-Oriented programming where 'R6' objects are being provided along with machine learning workflow. It is also seen as one of the extensible frameworks for clustering, regression, classification, and survival analysis.

You can install the mlr3 package by the following command.


You can load knitr using the following command.


You can visit the link mentioned below to learn more about mlr3.


9. XGBoost

XGBoost is an implementation of the gradient boosting framework. It also provides an interface for R where the model in R's caret package is also present. Its speed and performance are faster than the implementation in H20, Spark, and Python. This package's primary use case is for machine learning tasks like classification, ranking problems, and regression.

You can install the XGBoost package by the following command.


You can load XGBoost using the following command.


You can visit the link mentioned below to learn more about XGBoost.

Extreme Gradient Boosting with XGBoost

10. Caret

A caret package is a short form of Classification And Regression Training used for predictive modeling where it provides the tools for the following process.

  1. Pre-Processing: Where data is pre-processed and also the missing data is checked.preprocess() is provided by caret for doing such task.
  2. Data splitting: Splitting the training data into two similar categorical data sets is done.
  3. Feature selection: Techniques which is most suitable like Recursive Feature selection can be used.
  4. Training Model: caret provides many packages for machine learning algorithms.
  5. Resampling for model tuning: The model can be tuned using repeated k-fold, k-fold, etc. Also, the parameter can be tuned using 'tuneLength.'
  6. Variable importance estimation: vlamp() can be used for any model to access the variable importance estimation.

You can install the caret package by the following command.


You can load caret using the following command.


You can visit the link mentioned below to learn more about caret from the author "Max Kuhn".

Machine Learning with caret in R

career building r skills with datacamp banner

Introduction to R

4 hr
Master the basics of data analysis in R, including vectors, lists, and data frames, and practice R with real data sets.
See DetailsRight Arrow
Start course
See MoreRight Arrow

How Organizations Can Bridge the Data Literacy Gap

Dr Selena Fisk joins the show to chat about the perception people have that "I'm not a numbers person" and how data literacy initiatives can move past that. How can leaders help their people bridge the data literacy gap and, in turn, create a data culture?

Adel Nehme

42 min

Why We Need More Data Empathy

We talk with Phil Harvey about the concept of data empath, real-world examples of data empathy, the importance of practice when learning something new, the role of data empathy in AI development, and much more.

Adel Nehme's photo

Adel Nehme

44 min

Introduction to Probability Rules Cheat Sheet

Learn the basics of probability with our Introduction to Probability Rules Cheat Sheet. Quickly reference key concepts and formulas for finding probability, conditional probability, and more.
DataCamp Team's photo

DataCamp Team

1 min

Data Governance Fundamentals Cheat Sheet

Master the fundamentals of data governance with our Data Governance Fundamentals Cheat Sheet. Quickly reference key concepts, best practices, and key components of a data governance program.
DataCamp Team's photo

DataCamp Team

1 min

Docker for Data Science: An Introduction

In this Docker tutorial, discover the setup, common Docker commands, dockerizing machine learning applications, and industry-wide best practices.
Arunn Thevapalan's photo

Arunn Thevapalan

15 min

Top Techniques to Handle Missing Values Every Data Scientist Should Know

Explore various techniques to efficiently handle missing values and their implementations in Python.
Zoumana Keita 's photo

Zoumana Keita

15 min

See MoreSee More