RStudio is a must-know tool for everyone who works with the R programming language. It's used in data analysis to import, access, transform, explore, plot, and model data, and for machine learning to make predictions on data.
If you're just getting started with learning R, it's high time for you to find out what RStudio is and how to install it and begin using it. This is exactly where this RStudio tutorial can come in handy. So, let's dive in.
What is RStudio?
Before discussing what RStudio is and why to use it, let's first give a definition of R.
R is a popular programming language and free and open-source software used in data analysis and data science. It's especially powerful in performing advanced statistical computing and creating compelling plots. R provides more than 18,000 dedicated data science packages (as of September 2022), both multipurpose and narrowly-specialized ones. It’s a technology that’s well-supported by an active and helpful online community and is compatible with various operating systems.
If you want to find more information about R and how to learn it, take a look at our resources:
- What is R? – The Statistical Computing Powerhouse
- How to Get Started with R
- Introduction to R course
RStudio is a flexible and multifunctional open-source IDE (integrated development environment) that is extensively used as a graphical front-end to work with R of version 3.0.1 or higher. In addition, it's also adapted to many other programming languages, such as Python or SQL.
RStudio offers numerous helpful features:
- A user-friendly interface
- The ability to write and save reusable scripts
- Easy access to all the imported data and created objects (like variables, functions, etc.)
- Exhaustive help on any object
- Code autocompletion
- The ability to create projects to organize and share your work with your collaborators more efficiently
- Plot previewing
- Easy switching between terminal and console
- Operational history tracking
- Plenty of articles from RStudio Support on how to use the IDE
How to Install RStudio
To install and start working in RStudio, we need first to download and install the R programming language itself. To download and install R, follow the steps below:
- Open The Comprehensive R Archive Network (CRAN), which is the official R website.
- In the upper part of the screen, find the section Download and Install R.
- Click the link corresponding to your operating system.
- Select the latest release.
- Open the downloaded file and follow simple installation instructions leaving default options everywhere.
To download and install RStudio, follow these steps:
1. Open the download page of the official RStudio website.
2. Scroll down to the download buttons for RStudio Desktop:
3. Click DOWNLOAD RSTUDIO DESKTOP.
4. Click DOWNLOAD under RStudio Desktop:
5. You'll see that your operating system is automatically identified. Press the big button to download the latest release of RStudio for your operating system:
6. Open the downloaded file and follow simple installation instructions opting for defaults everywhere.
How to Use RStudio
Now that we successfully installed RStudio, let's open it, explore its main parts, and try to perform various operations on it.
Opening RStudio will automatically launch R software. The platform interface looks as follows:
Roughly, we can divide the working window into three areas:
- Left area: includes the tabs Console, Terminal, and Background Jobs
- Top-right area: includes the tabs Environment, History, Connections, and Tutorial
- Bottom-right area: includes the tabs Files, Plots, Packages, Help, Viewer, and Presentation
Note: the above layout including tab names and their distribution is related to the RStudio version 2022.07.1+554. It may vary slightly for other versions.
Let's take a closer look at the essential tabs.
On this tab, we first see the information about the R version in use and also some basic commands to try. At the end of those descriptions, we can type our R code, press Enter, and get the result below the code line (e.g., try running 2*2 and see what happens). Virtually, we can do here anything we would do in any other R program, for example:
- Installing and loading R packages
- Performing simple or complex mathematical operations
- Assigning the result of an operation to a variable
- Importing data
- Creating common types of R objects, like vectors, matrices, or dataframes
- Exploring data
- Statistical analysis
- Building data visualizations
However, when we run our code directly in the console, it isn't saved for being reproduced further. If we need (and we usually do) to write a reproducible code to solve a specific task, we have to record and regularly save it in a script file rather than in the console.
We'll soon explore how to write scripts. For now, let's keep in mind that you should mostly use the console to test the code and install R packages since they only need to be installed once.
Whenever we define a new or re-assign an existing variable in RStudio, it's stored as an object in the workspace and gets displayed, together with its value, on the Environment tab in the top-right area of the RStudio window. Try running greeting <- "Hello, World!" in the console and see what happens on the Environment tab.
This also relates to more complex objects such as dataframes. When we import data as a dataframe (or create a dataframe from scratch), we see in the workspace not only the name of the new object but also the values and data type of each column. Moreover, we can display even more details about each object, such as its length and memory size.
In the example below, we created two variables in the console: greeting <- "Hello, World!" and my_vector <- c(1, 2, 3, 4). Note how they are displayed on the Environment tab:
We can change the way of displaying our variables from List to Grid in the top-right corner of the tab, as follows:
Note that now we can also see the length and size of each object.
In the Grid display mode, the box appears to the left of each variable. We can tick any of those boxes and click the Broom icon to remove the corresponding objects from the workspace:
If we tick the box to the left of the Name column and click the Broom icon, or if we just click this icon in the previous display mode (List), we'll clean our workspace removing all the variables from it.
Other important tabs
- Terminal – to run commands from the terminal
- History – to track the history of all the operations performed during the current RStudio session
- Files – to see the structure of the working folder, reset the working folder, navigate between folders, etc.
- Plots – to preview and export created data visualizations
- Packages – to check what packages were loaded and load or unload packages (by switching on/off the box to the left of a package name)
How to Write R Scripts in RStudio
As we mentioned earlier, if we want to be able to reproduce and reuse our code for further needs, we should write it in a script file rather than directly in the console.
To start recording a script, click File – New File – R Script. This will open a text editor in the top-left corner of the RStudio interface (above the Console tab):
In a script, we can do all the things we listed in the section on the console (and we can actually do the same things in any other R IDE), only that now our actions will be stored in a file for further usage or sharing. It's important to give a meaningful name to the script file and regularly save it (Ctrl + S in Windows/Linux, Cmd + S in Mac, File – Save in any operating system).
To run a single line of code from a script, put the cursor on that line and click the Run icon on the top-right of the text editor. Otherwise, use a keyboard shortcut (Ctrl + Enter in Windows/Linux, Cmd + Enter in Mac). To run multiple code lines, do the same after selecting the necessary lines. To run all code lines, select all the lines and click the Run icon OR use a keyboard shortcut (Ctrl + A + Enter in Windows/Linux, Cmd + A + Enter in Mac).
When we write a script, it makes sense to add code comments where necessary (using a hashtag symbol # followed by a line of a comment text) to explain to a potential future reader the why behind certain pieces of code.
Also, it's a good idea to add some important context at the beginning of the script: the author and contributors of the code, when it was written, when it was updated, the scope of the code, etc. Another helpful practice is to load all the needed R packages at the beginning of the script, just after providing the initial information.
How to Perform Various Operations in RStudio
Next, we'll discuss what actions we can perform in RStudio for data analysis purposes. Virtually, all the operations we're going to consider aren't strictly related to RStudio but rather to using R in general, in whatever IDE.
Hence, we aren't going to take a granular look at all the technical details of those operations. Instead, we'll see some common tasks, their practical implementation in R (code examples), and alternative approaches (where applicable) to those tasks in RStudio.
Copy-paste the below examples into the console of RStudio and explore the results. Consider trying both general and alternative (RStudio-specific) approaches.
Installing R Packages
- Remember to install all packages in the console rather than in a script file since they have to be installed on a computer's hard disk only once.
- You can install packages directly from the RStudio interface: open the Packages tab (the bottom-left area), click Install and select the necessary packages from CRAN separated with a space or comma, as follows:
Loading R Packages
Note that while we used quotation marks for installing packages, we don't use them for loading packages.
- Load all the necessary packages in a script file rather than in the console.
- Loading/unloading installed or system packages can be done by searching and ticking/unticking those packages on the Packages tab. Note that some packages can't be unloaded if they were imported by other packages.
Checking Loaded R Packages
Run in the console (.packages()) or search() to get a list of all the loaded packages.
In RStudio: open the Packages tab, search for a specific package, and check if the box to the left of its name is ticked.
Getting Help on an R Package or any Built-in R Object
To get help on an installed and loaded package, or a function of an installed and loaded package, or any other built-in R object (such as a preloaded dataset), use one of the following syntaxes:
Note: we need to pass in a function name to the help function without parentheses.
The Help tab will be opened with the package or object documentation. If we're checking a package, then we'll get the list of all its functions and the link to the documentation for each of them.
For example, run the following in the console (after making sure that the readr and dplyr packages are installed and loaded):
help("read.csv") ?readr help(help) help('CO2')
In RStudio: find and click the desired package name (even if it isn't loaded) on the Packages tab and see the result on the Help tab.
world_population <- read.csv("world_population.csv")
(To run the above piece of code, first, download the publicly available World Population Dataset from Kaggle and unzip it into the same folder where you store your R script.)
The result of running the above piece of code will be an R dataframe in your working folder.
- File – Import Dataset
- Click Import Dataset on the Environment tab:
Then select From Text (base)..., navigate to the right folder, select the file to import, fill in or check the fields Name, Heading, Separator, and Decimal in the pop-up window, preview the dataset structure, and click Import:
You can now find and explore the imported dataset on the Environment tab and in a spreadsheet opened in a new tab:
If you want to learn more about how to import data with R, explore a well-rounded DataCamp skill track Importing & Cleaning Data with R. You can find various datasets to import and work with on DataCamp Workspace.
Accessing Built-in R Datasets
To see the full list of available sample datasets preloaded in R, including their names and short descriptions, run the following piece of code in the console:
You can take any of the appeared names and use each of them as a variable (containing a dataframe) to work with and practice your skills in R.
If you need more information about a selected preloaded dataset, run the help() function on it, e.g, help(CO2).
Wrangling and Analyzing Data in RStudio
Like in any other R IDE, in RStudio, we can access, manipulate, transform, analyze, and model the data in R. Below are some examples of standard operations performed on the CO2 built-in dataset:
head(CO2) tail(CO2) colnames(CO2) dim(CO2) str(CO2) summary(CO2) summary(CO2$uptake) median(CO2$uptake) class(CO2$uptake) unique(CO2$Treatment) subset(CO2, conc == min(CO2$conc))
Try running them one by one in RStudio and observe the output.
Plotting Data in RStudio¶
Like in any other R IDE, in RStudio, we can plot the data. Below are some examples of creating simple plots for the CO2 and Orange built-in datasets. In both cases, the resulting plot appears on the Plots tab and can be exported using the Export button of that tab:
- Creating a histogram:
- Creating a scatter plot:
We can tune a few parameters available for the basic plot() function to add some aesthetics to the last plot:
plot(Orange$age, Orange$circumference, xlab="Age", ylab="Circumference", main="Circumference vs. Age", col="blue", pch=16)
Or we can use ggplot2 or any other specialized data visualization package of which R offers a vast choice. The DataCamp skill track Data Visualization with R can be a good point to start mastering your plotting skills in R.
Creating Data from Scratch in R
Again, in this case, RStudio isn't different from any other R IDE.
To create a vector:
oceans <- c("Arctic", "Atlantic", "Indian", "Pacific", "Southern") avg_depth <- c(1.2, 3.65, 3.74, 3.97, 3.27)
(The above data was taken from Wikipedia.)
To create a dataframe:
oceans_depth <- data.frame(oceans, avg_depth)
Printing out the result:
The resulting vectors and dataframe also appears on the Environment tab of RStudio:
In this tutorial, we explored plenty of essential aspects of using RStudio:
- What RStudio is and what advantages it has
- How to install RStudio
- What the RStudio interface looks like and how to use its main parts
- The difference between running code in the console and a script
- Where to find all the objects used in the current RStudio session
- The best practices for writing scripts
- How to perform various operations in RStudio, such as installing and loading R packages, importing data, wrangling, analyzing, and visualizing data, creating R objects from scratch, etc.
Now that you're familiar with RStudio, you can go ahead and start using it. For example, think about building your own R projects in RStudio. For more inspiration, check the article on The Top 10 R Project Ideas for 2022.
If you feel that you need more training in R before starting to create projects in RStudio, consider the following beginner-friendly and exhaustive online career and skill tracks and courses of DataCamp: