Blog

How to Import Data Into R: A Tutorial

Find out how to import data into R, including CSV, JSON, Excel, HTML, databases, SAS, SPSS, Matlab, and other files using the popular R packages.

Updated Jan 2023 · 20 min read

Loading data into R can be quite frustrating. Almost every single type of file that you want to get into R seems to require its function, and even then, you might get lost in the functions’ arguments. In short, it can be fairly easy to mix up things from time to time, whether you are a beginner or a more advanced R user.

To cover these needs, we’ve created a comprehensive yet easy tutorial on how to import data into R, going from simple text files to more advanced SPSS and SAS files. Keep reading to find out how you can easily import your files into R!

Practice importing data Into R with this hands-on exercise.

Run and edit the code from this tutorial online

Run Code

Getting Started with Importing Data into R

R is a powerful statistical tool. Compared to other software like Microsoft Excel, R provides us with faster data loading, automated data cleaning, and in-depth statistical and predictive analysis. It is all done by using open-source R-Packages, and we are going to learn how to use them to import various types of datasets.

We will be using DataLab for running code examples. It comes with pre-installed packages and the R environment. You don’t have to set anything up and start coding within seconds. It is a free service and comes with a large selection of datasets. You can also integrate your SQL server to start performing exploratory data analysis.

After loading the DataLab workbook, you need to install a few packages that are not popular but are necessary to load SAS, SPSS, Stata, and Matlab files.

Note: Make sure to install dependencies by using the `dependency=T` parameter in `install.packages` function.

The Tidyverse package comes with various packages that allow you to read flat files, clean data, perform data manipulation and visualization, and much more.

install.packages(c('quantmod','ff','foreign','R.matlab'),dependency=T)

suppressPackageStartupMessages(library(tidyverse))

In this tutorial, we will learn to load commonly used CSV, TXT, Excel, JSON, Database, and XML/HTML data files in R. Moreover, we will also look at less commonly used file formats such as SAS, SPSS, Stata, Matlab, and Binary.

Commonly used Data Types

We will be learning about all popular data formats and loading them using various R packages. Furthermore, we will use URLs to scrape HTML tables and XML data from the website with few lines of code.

Importing data to R from a CSV and TXT files

Importing a CSV file in R

In this section, we will read data in r by loading a CSV file from Hotel Booking Demand. This dataset consists of booking data from a city hotel and a resort hotel. To import the CSV file, we will use the readr package’s `read_csv` function. Just like in Pandas, it requires you to enter the location of the file to process the file and load it as a dataframe.

You can also use the `read.csv` or `read.delim` functions from the utils package to load CSV files.

data1 <- read_csv('data/hotel_bookings_clean.csv',show_col_types = FALSE)
head(data1, 5)

Similar to `read_csv` you can also use the read.table function to load the file. Make sure you are adding delimiter “,” and header = 1. It will set the first row as column names instead of “V1”, “V2”,...

data2 <- read.table('data/hotel_bookings_clean.csv', sep=",", header = 1)
head(data2, 5)

Importing a TXT file in R

In this part, we will use the Drake Lyrics dataset to load a text file. The file consists of Lyrics from the singer Drake. We can use the `readLines` function to load the simple file, but we have to perform additional tasks to convert it into a dataframe.

Image by Author | Text file

We will use read.table’s alternative function, `read.delim` to load the text file as an R dataframe. Other read.table’s alternative functions are read.csv, read.csv2, and read.delim2.

Note: by default, it is separating the values on Tab (sep = "\t")

The text file consists of lyrics and doesn't have a header row. To display all of the lyrics in a row, we need to set `header = F`.

You can also use other parameters to customize your dataframe, for example, the fill parameter, which sets the blank field to be added to rows of unequal length.

Read the documentation to learn about every parameter in the read.table’s alternative functions.

data3 <- read.delim('data/drake_lyrics.txt',header = F)
head(data3, 5)

Importing data from Excel into R

In this section, we will be using the Tesla Deaths dataset from Kaggle to import from Excel into R. The dataset is about tragic Tesla vehicle accidents that have resulted in the death of a driver, occupant, cyclist, or pedestrian.

The dataset contains a CSV file, and we will use MS excel to convert it into an Excel file, as shown below.

Image by Author

We will use the readxl package’s `read_excel` function to read a single sheet from an Excel file. The package comes with tiddyvers, but it is not the core part of it, so we need to load the package before using the function.

The function requires the location of the data and sheet number. We can also modify how our dataframe looks by reading the description of other parameters in read_excel documentation.

library(readxl)
data4 <- read_excel("data/Tesla Deaths.xlsx", sheet = 1)
head(data4, 5)

Importing data from a JSON file

In this part, we will load JSON into R using a file from the Drake Lyrics dataset. It contains lyrics, song title, album title, URL, and view count of Drake songs.

Image by Author

To load a JSON file, we will load the rjson package and use `fromJSON` to parse the JSON file.

library(rjson)
JsonData <- fromJSON(file = 'data/drake_data.json')
print(JsonData[1])

Output:

[[1]]
[[1]]$album
[1] "Certified Lover Boy"

[[1]]$lyrics_title
[1] "Certified Lover Boy* Lyrics"

[[1]]$lyrics_url
[1] "https://genius.com/Drake-certified-lover-boy-lyrics"

[[1]]$lyrics
[1] "Lyrics from CLB Merch\n\n[Verse]\nPut my feelings on ice\nAlways been a gem\nCertified lover boy, somehow still heartless\nHeart is only gettin' colder"

[[1]]$track_views
[1] "8.7K"

To convert the JSON data into an R dataframe, we will use data.table package’s `as.data.frame` function.

data5 = as.data.frame(JsonData[1])
data5

Importing data from a Database using SQL in R

In this part, we are going to use Mental Health in the Tech Industry dataset from Kaggle to load SQLite databases using R. To extract the data from databases using SQL query, we will use the DBI package and SQLite function and create the connection. You can also use similar syntax to load data from other SQL servers.

We will load the RSQLite package and load the database using the dbConnect function.

Note: you can use dbConnect to load data from MySQL, PostgreSQL, and other popular SQL servers.

After loading the database, we will display the names of the tables.

library(RSQLite)
conn <- dbConnect(RSQLite::SQLite(), "data/mental_health.sqlite")
dbListTables(conn)
# 'Answer''Question''Survey'

To run a query and display the results, we will use the `dbGetQuery` function. Just add a SQLite connection object and SQL query as a string.

dbGetQuery(conn, "SELECT * FROM Survey")

Using SQL within R provides you with greater control of data ingestion and analysis.

data6 = dbGetQuery(conn, "SELECT * FROM Question  LIMIT 3")
data6

Learn more about running SQL queries in R by following the How to Execute SQL Queries in Python and R tutorial. It will teach you how to load databases and use SQL with dplyr and ggplot.

Importing data from XML and HTML files

Importing XML into R

In this section, we will load plant_catalog XML data from w3schools using xml2 package.

Note: You can also use the XML package’s `xmlTreeParse` function to load the data.

Just like the `read_csv` function, we can load the XML data by providing a URL link to the XML site. It will load the page and parse XML data.

library(xml2)
plant_xml <- read_xml('https://www.w3schools.com/xml/plant_catalog.xml')
plant_xml_parse <- xmlParse(plant_xml)

Later, you can convert XML data to an R data frame using the `xmlToDataFrame` function.

1. Extract node set from XML data.

- The first argument is a doc that is the object of the class `XMLInternalDocument`.
- The second argument is a path, which is a string giving the XPath expression to evaluate.

2. Add the `plant_node` to the `xmlToDataFrame` function and display the first five rows of the R dataframe.

plant_nodes= getNodeSet(plant_xml_parse, "//PLANT")

data9 <- xmlToDataFrame(nodes=plant_nodes)
head(data9,5)

Importing HTML Table into R

This section is fun as we will be scraping the Wikipedia page of Argentina national football team to extract the HTML table and convert it into a data frame with few lines of code.

Image from Wikipedia

To load an HTML table, we will use XML and RCurl packages. We will provide the Wikipedia URL to the `getURL` function and then add the object to the `readHTMLTable` function, as shown below.

The function will extract all of the HTML tables from the website, and we just have to explore them individually to select the one we want.

library(XML)
library(RCurl)
url <- getURL("https://en.wikipedia.org/wiki/Brazil_national_football_team")
tables <- readHTMLTable(url)
data7 <- tables[31]
data7$`NULL`

Moreover, you can use the rvest package to read HTML using URL, extract all tables, and display it as a dataframe.

read_html(URL) for extracting HTML data from websites.
html_nodes(file, "table") for extracting tables from HTML data.
html_table(tables[25]) for converting HTML tables to dataframe.

library(rvest)
url <- "https://en.wikipedia.org/wiki/Argentina_national_football_team"
file<-read_html(url)
tables<-html_nodes(file, "table")
data8 <- html_table(tables[25])
View(data8)

If you are facing issues while following the tutorial, you can always check out the DataLab workbook with all the code for this tutorial. Just make a copy and start practicing.

Other Data Types

The other less popular but essential data types are from statistical software, Matlab, and binary data.

Importing data from a SAS file

In this section, we will use the haven package to import SAS files. You can download the data from the GeeksforGeeks blog. The haven package allows you to load SAS, SPSS, and Stata files into R with minimal code.

Provide the file directory to the `read_sas` function to load the `.sas7bdat` file as a dataframe. Read the function’s documentation to learn more about it.

library(haven)
 
data10 <- read_sas('data/lond_small.sas7bdat')

# display data
head(data10,5)

Importing data from a SPSS file

As we already know, we can also use the haven package to load SPSS files into R. You can download the data from the GeeksforGeeks blog and use the `read_sav` function just to load the SPSS sav file.

It requires the file directory as a string and you can modify your dataframe by using additional arguments such as encoding, col_select, and compress.

library(haven)
 
data11 <- read_sav("data/airline_passengers.sav")                       
head(data11,5)

You can also use a foreign package to load a `.sav` file as a dataframe using the `read.spss` function. The function requires only two arguments file and to.data.frame. Learn more about other arguments by reading the function's documentation.

Note: foreign package also allows you to load Minitab, S, SAS, SPSS, Stata, Systat, Weka, and Octave file formats.

library("foreign")
 
data12 <- read.spss("data/airline_passengers.sav", to.data.frame = TRUE)
head(data12,5)

Importing Stata Files into R

In this part, we will use the foreign package to load the Stata file from ucla.edu.

The read.dta reads a file in Stata version 5--12 binary formats and converts it into a data frame.

“It is that simple.“

library("foreign")
data13 <- read.dta("data/friendly.dta")
head(data13,5)

Importing Matlab Files into R

Matlab is quite famous among students and researchers. The R.matlab allows us to load the`.mat` file, so that we can perform data analysis and run simulations in R.

Download the Matlab files from Kaggle to try it on your own.

library(R.matlab)
data14 <- readMat("data/cross_dsads.mat")
head(data14$data.dsads)

Importing Binary Files into R

In this part, we will first create a binary file and then read the file using the `readBin` function.

Note: the code example is a modified version of the Working with Binary Files in R Programming blog.

First, we need to create a dataframe with four columns and four rows.

df = data.frame(
  "ID" = c(1, 2, 3, 4),
  "Name" = c("Abid", "Matt", "Sara", "Dean"),
  "Age" = c(34, 25, 27, 50),
  "Pin" = c(234512, 765345, 345678, 098567)
)

After that, create a connection object using a `file` function.

con = file("data/binary_data.dat", "wb")

Write the column names into the file using the `writeBin` function.

writeBin(colnames(df), con)

Write the value of each column into the file.

writeBin(c(df$ID, df$Name, df$Age, df$Pin), con)

Close the connection after writing the data into the file.

close(con)

To read the binary file, we have to create a connection to the file and use the `readBin` function to display data as an integer.

Arguments used in the function:

conn: a connection object.
what: an object whose mode will give the mode of the vector to be read.
n: The number of records to be read.

con = file("data/binary_data.dat", "rb")
data15_1 = readBin(con, integer(), n = 25)
print(data15_1)

Output:

[1] 1308640329    6647137    6645569    7235920    3276849    3407923
[7] 1684628033 1952533760 1632829556 1140875634    7233893  838874163
[13]  926023733    3159296  892613426  922759729  875771190  875757621
[19]  943142453  892877056

You can also convert the data from binary to string by replacing “integer()” with “character()” in the `what` argument.

Read the readBin function documentation to learn more.

con = file("data/binary_data.dat", "rb")
data15_2 = readBin(con, character(), n = 25)
print(data15_2)

Output:

[1] "ID"     "Name"   "Age"    "Pin"    "1"      "2"      "3"      "4"    
[9] "Abid"   "Matt"   "Sara"   "Dean"   "34"     "25"     "27"     "50"   
[17] "234512" "765345" "345678" "98567"

Learn how to import flat files, statistical software, databases, or data right from the web by taking Intermediate Importing Data in R course.

Importing Data using QuantMod into R

The quantmod is a financial modeling and trading framework for R. We will be using it to download and load the latest trading data in the form of a dataframe.

We will use quantmod’s `getSymbols` function to load Google stock's historical data by providing “from” and “to” date and “frequency”. Learn more about the quantmod package by reading the documentation.

library(quantmod)
getSymbols("GOOGL",
          from = "2022/12/1",
          to = "2023/1/15",
          periodicity = "daily")
# 'GOOGL'

The data is loaded in a `GOOGL` object, and we can view the first five rows by using the `head()` function.

head(GOOGL,5)

Output:

GOOGL.Open GOOGL.High GOOGL.Low GOOGL.Close GOOGL.Volume
2022-12-01     101.02     102.25    100.25      100.99     28687100
2022-12-02      99.05     100.77     98.90      100.44     21480700
2022-12-05      99.40     101.38     99.00       99.48     24405100
2022-12-06      99.30      99.78     96.42       96.98     24910700
2022-12-07      96.41      96.88     94.72       94.94     31045400
          GOOGL.Adjusted
2022-12-01         100.99
2022-12-02         100.44
2022-12-05          99.48
2022-12-06          96.98
2022-12-07          94.94

Importing Large Datasets into R

Importing a large file is tricky. You need to make sure that the function is optimized for memory-efficient storage and fast access.

In this section, we will be looking at popular functions used for loading CSV files greater than 1 GB. We are using the US Accidents (2016 - 2021) dataset from Kaggle, which is about 1.15GB in size and has 2,845,342 records.

Import a Large Data using read.table

We can load the zip file directly into utils’s read.table function by using the `unz` function. It will save you time to extract and then load the CSV file.

Provide unz with zip directory and the CSV file within zip.
After that add the unz object to the `read.table` function.
To load data faster, we will be limiting the nrows to 10,000.
By default, it separates the values on space, so make sure you are providing a sep argument with a comma.

file <- unz("data/US Accidents.zip", "US_Accidents_Dec21_updated.csv")
data16 <- read.table(file, header=T, sep=",",nrow=10000)
data16

Import a Large Data using read.csv2

Similar to `read.table`, we can use readr’s `read_csv` function to load the CSV file. Instead of nrow, we will use n_max to read a limited number of records.

In our case, we are not restricting any data and allowing the function to load full data.

Note: it almost took a minute to load full data. You can change the number of threads to reduce loading time. Learn more by reading the function documentation.

data17 <- read_csv('data/US_Accidents_Dec21_updated.csv')
data17

Import a Large Data using ff Package

We can also use the ff package to optimize loading time and storage. The read.table.ffdf function loads data in chunks reducing the loading time.

We will first unzip the file and read the data using the `read.table.ffdf` function.

unzip('data/US Accidents.zip',exdir='data')
library(ff)
data18 <- read.table.ffdf(file="data/US_Accidents_Dec21_updated.csv",
                          nrows=10000,
                          header = TRUE,
                          sep = ',')
data18[1:5,1:25]

In the end, we will look at the most commonly used function `fread` from the data.table package to read the first 10,000 rows. The function can automatically understand file format, but in rare cases, you have to provide a sep argument.

library(data.table)
data19 <- fread("data/US_Accidents_Dec21_updated.csv",
              sep=',',
              nrows = 10000,
              na.strings = c("NA","N/A",""),
              stringsAsFactors=FALSE
              )
data19

Dataset Reference

If you want to try the code examples on your own, here is the list of all of the datasets used in the tutorial.

CSV: Hotel Booking Demand
TXT: Drake Lyrics
Excel: Tesla Deaths
JSON: Drake Lyrics
SQL DB: Mental Health in the Tech Industry
XML: w3schools
HTML: Argentina national football team
SAS: SAS Files - GeeksforGeeks
SPSS: SPSS Files - GeeksforGeeks
Stata: Applied Regression Analysis by Fox Data Files
Matlab: Cross-position activity recognition | Kaggle
Large Dataset: US Accidents (2016 - 2021)

Conclusion

R is an amazing language, and it comes with all kinds of integration. You can load any type of dataset, clean and manipulate it, perform exploratory and predictive data analysis, and publish high-quality reports.

In this tutorial, we have learned how to load all kinds of datasets using the popular R packages for better storage and performance. If you want to start your data science career with R, check out Data Scientist with R career track. It consists of 24 interactive courses that will teach you everything about R programming, statistical analysis, data handling, and predictive analysis. Moreover, you can take a certification exam after completing the track to enter the job market.

Also check out the Import Data Into R DataLab workbook which comes with source code, outputs, and a data repository. You can make a copy and start learning on your own.

Get certified in your dream Data Analyst role

Our certification programs help you stand out and prove your skills are job-ready to potential employers.

Get Your Certification

Topics

R Programming

Data Science

Learn more about R

Course

Intermediate Importing Data in R

3 hr

78.1K

Parse data in any format. Whether it's flat files, statistical software, databases, or data right from the web.

See Details

Start Course

Course

Introduction to Importing Data in R

3 hr

187.2K

In this course, you will learn to read CSV, XLS, and text files in R using tools like readxl and data.table.

See Details

Start Course

Course

Importing and Managing Financial Data in R

5 hr

19.1K

Learn how to access financial data from local files as well as from internet sources.

See Details

Start Course

A Data Science Roadmap for 2024

Do you want to start or grow in the field of data science? This data science roadmap helps you understand and get started in the data science landscape.

Mark Graus

10 min

Sorting Data in R

How to sort a data frame in R.

DataCamp Team

2 min

Merging Data in R

Merging data is a common task in data analysis, especially when working with large datasets. The merge function in R is a powerful tool that allows you to combine two or more datasets based on shared variables.

DataCamp Team

4 min

Operators in R

Learn how to use arithmetic and logical operators in R. These binary operators work on vectors, matrices, and scalars.

DataCamp Team

4 min

Axes and labels in R

Improve your graphs in R with titles, text annotations, labelling of points, minor tick marks, reference lines, custom axes, and a chart legend.

DataCamp Team

4 min

How to Transpose a Matrix in R: A Quick Tutorial

Learn three methods to transpose a matrix in R in this quick tutorial

Adel Nehme

See More See More

Getting Started with Importing Data into R

Commonly used Data Types

Importing data to R from a CSV and TXT files

Importing a CSV file in R

Importing a TXT file in R

Importing data from Excel into R

Importing data from a JSON file

Importing data from a Database using SQL in R

Importing data from XML and HTML files

Importing XML into R

Importing HTML Table into R

Other Data Types

Importing data from a SAS file

Importing data from a SPSS file

Importing Stata Files into R

Importing Matlab Files into R

Importing Binary Files into R

Importing Data using QuantMod into R

Importing Large Datasets into R

Import a Large Data using read.table

Import a Large Data using read.csv2

Import a Large Data using ff Package

Dataset Reference

Conclusion

Get certified in your dream Data Analyst role

A Data Science Roadmap for 2024

Sorting Data in R

Merging Data in R

Operators in R

Axes and labels in R

How to Transpose a Matrix in R: A Quick Tutorial

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Intermediate Importing Data in R

Introduction to Importing Data in R

Importing and Managing Financial Data in R

A Data Science Roadmap for 2024

Sorting Data in R

Merging Data in R

Operators in R

Axes and labels in R

How to Transpose a Matrix in R: A Quick Tutorial

Intermediate Importing Data in R