Skip to main content

Data Types in R

Learn about data types and their importance in a programming language. More specifically, learn how to use various data types like vector, matrices, lists, and dataframes in the R programming language.
Jan 2020  · 12 min read

r data structures diagram

Before we start with the introduction and learn about various data types in R, let's quickly set up the R environment both on the Terminal and Jupyter Notebook.

The following command is for Mac operating system, which will install R on your terminal.

brew install r --build-from-source

To verify the installation has been successful, just type R (upper-case) in the terminal, and you will enter into an R session, as shown below.

r session

For installation on other operating systems, feel free to check this tutorial.

Now let's add R programming language as a kernel on jupyter notebook. Make sure you have jupyter notebook already installed on your system.

Go to your terminal and open the R session and enter the below two commands, which will add the R kernel to your jupyter notebook.


Once the above two commands are successful, run jupyter from the terminal and open a notebook with R kernel as shown below:

r kernel

Now you are all set to write your first R code on jupyter notebook.


To make use of R to the fullest, it is very important to know and understand various data types and data structures that exist in R and how they function. They play a key role in almost all problems and especially when you are working on machine learning problems, which are very data-centric.

In a programming language, we usually need variables to store information, which can be an integer, character, floating-point, boolean, etc. The type of the variable is purely based on which kind of information it holds. If it is assigned an integer, then the variable has a data type as int. Variables are merely reserved memory locations at which values are stored. As soon as you create a variable, some memory space is reserved for it.

Based on the data type of a variable, some memory will be allocated by the operating system. For example, in R programming, a variable that holds an integer will reserve a memory of 4 bytes and 1 byte for a character.

Programming languages like C, C++, and Java, variables are declared as data type; however, in Python and R, the variables are an object. Objects are nothing but a data structure having few attributes and methods which are applied to its attributes.

There are various kinds of R-objects or data structures which will be discussed in this tutorial like:

  • Vectors

  • Lists

  • Matrices

  • Arrays

  • Factors

  • Data Frames

Let's first understand some of the basic datatypes on which the R-objects are built like Numeric, Integer, Character, Factor, and Logical.

  • Numeric: Numbers that have a decimal value or are a fraction in nature have a data type as numeric.
num <- 1.2
[1] 1.2

You can check the data type of a using keyword class().

  • Integer: Numbers that do not contain decimal values have a data type as an integer. However, to create an integer data type, you explicitly use as.integer() and pass the variable as an argument.
int <- as.integer(2.2)
[1] 2
  • Character: As the name suggests, it can be a letter or a combination of letters enclosed by quotes is considered as a character data type by R. It can be alphabets or numbers.
char <- "datacamp"
[1] "datacamp"
char <- "12345"
[1] "12345"
  • Logical: A variable that can have a value of True and False like a boolean is called a logical variable.
log_true <- TRUE
[1] TRUE
log_false <- FALSE
  • Factor: They are a data type that is used to refer to a qualitative relationship like colors, good & bad, course or movie ratings, etc. They are useful in statistical modeling.

To achieve this, you will make use of the c() function, which returns a vector (one-dimensional) by combining all the elements.

fac <- factor(c("good", "bad", "ugly","good", "bad", "ugly"))
[1] good bad  ugly good bad  ugly
Levels: bad good ugly

The fac factor has three levels as good, bad, and ugly, which can be checked using the keyword levels, and the type of level will be a character.

  1. 'bad'
  2. 'good'
  3. 'ugly'

Before moving forward, let us understand a couple of important tips that can come in handy!

  • Always remember that R programming language is case-sensitive. All of the objects that are defined above should be used in the same manner, be it upper or lower, as shown in the example below.
Error in eval(expr, envir, enclos): object 'Num' not found
  • In R, you can check all the variables or objects that have been defined by you in the working environment by using keyword the ls(), as shown below.
  1. 'char'
  2. 'int'
  3. 'num'


List indexing in Python

Unlike vectors, a list can contain elements of various data types and is often known as an ordered collection of values. It can contain vectors, functions, matrices, and even another list inside it (nested-list).

Lists in R are one-indexed, i.e., the index starts with one.

Let's understand the concept of lists with a quick example that will have three different types of data types stored in one list.

lis1 <- 1:5  # Integer Vector
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
lis2 <- factor(1:5)  # Factor Vector
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  1. '1'
  2. '2'
  3. '3'
  4. '4'
  5. '5'
lis3 <- letters[1:5]  # Character Vector
  1. 'a'
  2. 'b'
  3. 'c'
  4. 'd'
  5. 'e'
combined_list <- list(lis1, lis2, lis3)
    1. 1
    2. 2
    3. 3
    4. 4
    5. 5
    1. 1
    2. 2
    3. 3
    4. 4
    5. 5
  1. Levels:
    1. '1'
    2. '2'
    3. '3'
    4. '4'
    5. '5'
    1. 'a'
    2. 'b'
    3. 'c'
    4. 'd'
    5. 'e'

Let's access each vector in the list separately. To achieve this, you will use double square brackets since the three vectors are placed on one level inside the list. python combined_list[[1]]

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5

python combined_list[[2]]

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  1. '1'
  2. '2'
  3. '3'
  4. '4'
  5. '5'
  1. 'a'
  2. 'b'
  3. 'c'
  4. 'd'
  5. 'e'

Now, let us try to access the fifth element from the third vector, which gives the letter e.


Finally, let's try to flatten the list. One important thing to remember is that since combined_list is a combination of character and numeric datatype, the character data type will get the precedence, and the data type of complete list will become a character.

flat_list <- unlist(combined_list)
  1. '1'
  2. '2'
  3. '3'
  4. '4'
  5. '5'
  6. '1'
  7. '2'
  8. '3'
  9. '4'
  10. '5'
  11. 'a'
  12. 'b'
  13. 'c'
  14. 'd'
  15. 'e'


vector matrix

Vectors are an object which is used to store multiple information or values of the same data type. A vector can not have a combination of both integer and character. For example, if you want to store 100 students' total marks, instead of creating 100 different variables for each student, you would create a vector of length 100, which will store all the student marks in it.

A vector can be created with a function c(), which will combine all the elements and return a one-dimensional array.

Let's create a vector marks with data of five students of class numeric.

marks <- c(88,65,90,40,65)

Let us check the length of the vector, which should return the number of elements contained in it.


Now, let's try to access a specific element by its index.

marks[6] #returns NA since there is no sixth element in the vector


  • Slicing: Similar to Python, the concept of slicing can be applied in R as well.

    Let's try to access elements from second to fifth using slicing.

  1. 65
  2. 90
  3. 40
  4. 65

Let's now create a character vector that is similar to creating a numeric character.

char_vector <- c("a", "b", "c")
[1] "a" "b" "c"


  1. 'a'
  2. 'b'
  3. 'c'

If we create a vector that has both numeric and character values, the numeric values will get converted to a character data type.

char_num_vec <- c(1,2, "a")
  1. '1'
  2. '2'
  3. 'a'

Let's create a vector with 1024 numeric values with the help of a slicing concept.

vec <- c(1:1024)

Now, try to access the middle and the last element. To do that, you will use the length function.

  • How do you create a vector of odd numbers?

To create a vector of odd numbers, you can use the function seq, which takes in three parameters: start, end, and step size.

seq(1,10, by = 2)
  1. 1
  2. 3
  3. 5
  4. 7
  5. 9


matrix in r

Similar to a vector, a matrix is used to store information about the same data type. However, unlike vectors, matrices are capable of holding two-dimensional information inside it.

The syntax of defining a matrix is:

M <- matrix(vector, nrow=r, ncol=c, byrow=FALSE, dimnames=list(char_vector_rownames, char_vector_colnames))

byrow=TRUE signifies that the matrix should be filled by rows. byrow=FALSE indicates that the matrix should be filled by columns (the default).

Let's quickly define a matrix M of shape $2\times3$.

M = matrix( c('AI','ML','DL','Tensorflow','Pytorch','Keras'), nrow = 2, ncol = 3, byrow = TRUE)
     [,1]         [,2]      [,3]   
[1,] "AI"         "ML"      "DL"   
[2,] "Tensorflow" "Pytorch" "Keras"

Let's use the slicing concept and fetch elements from a row and column.

M[1:2,1:2] #the first dimension selects both rows while the second dimension will select
#elements from 1st and 2nd column
A matrix: 2 × 2 of type chr
Tensorflow Pytorch


data frames

Unlike a matrix, Data frames are a more generalized form of a matrix. It contains data in a tabular fashion. The data in the data frame can be spread across various columns, having different data types. The first column can be a character while the second column can be an integer, and the third column can be logical.

The variables or features are in columnar fashion, also known as a header, while the observations are in rows with the first element being the name of the row followed by the actual data, also known as data rows.

DataFrame can be created using the data.frame() function.

DataFrame has been widely used in the reading comma-separated files (CSV), text files. Their use is not only limited to reading the data, but you can also use them for machine learning problems, especially when dealing with numerical data. DataFrames can be useful for understanding the data, data wrangling, plotting and visualizing.

Let's create a dummy dataset and learn some data frame specific functions.

dataset <- data.frame(
   Person = c("Aditya", "Ayush","Akshay"),
   Age = c(26, 26, 27),
   Weight = c(81,85, 90),
   Height = c(6,5.8,6.2),
   Salary = c(50000, 80000, 100000)
  Person Age Weight Height Salary
1 Aditya  26     81    6.0  5e+04
2  Ayush  26     85    5.8  8e+04
3 Akshay  27     90    6.2  1e+05
nrow(dataset) # this will give you the number of rows that are there in the dataset dataframe
ncol(dataset) # this will give you the number of columns that are there in the dataset dataframe
df1 = rbind(dataset, dataset) # a row bind which will append the arguments in row fashion.
A data.frame: 6 × 5
Person Age Weight Height Salary
<fct> <dbl> <dbl> <dbl> <dbl>
Aditya 26 81 6.0 5e+04
Ayush 26 85 5.8 8e+04
Akshay 27 90 6.2 1e+05
Aditya 26 81 6.0 5e+04
Ayush 26 85 5.8 8e+04
Akshay 27 90 6.2 1e+05
df2 = cbind(dataset, dataset) # a column bind which will append the arguments in column fashion.
A data.frame: 3 × 10
Person Age Weight Height Salary Person Age Weight Height Salary
<fct> <dbl> <dbl> <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl>
Aditya 26 81 6.0 5e+04 Aditya 26 81 6.0 5e+04
Ayush 26 85 5.8 8e+04 Ayush 26 85 5.8 8e+04
Akshay 27 90 6.2 1e+05 Akshay 27 90 6.2 1e+05

Let's look at the head function which is very useful when you have millions of records and you want to look at only the first few rows of your data. Similarly, the tail function will output the last few rows of your data.

head(df1,3) # here only three rows will be printed
A data.frame: 3 × 5
  Person Age Weight Height Salary
  <fct> <dbl> <dbl> <dbl> <dbl>
1 Aditya 26 81 6.0 5e+04
2 Ayush 26 85 5.8 8e+04
3 Akshay 27 90 6.2 1e+05
str(dataset) #this returns the individual class or data type information for each column.
'data.frame':    3 obs. of  5 variables:
 $ Person: Factor w/ 3 levels "Aditya","Akshay",..: 1 3 2
 $ Age   : num  26 26 27
 $ Weight: num  81 85 90
 $ Height: num  6 5.8 6.2
 $ Salary: num  5e+04 8e+04 1e+05

Now let's look at the summary() function, which comes in handy when you want to understand the statistics of your dataset. As shown below, it divides your data into three quartiles, based on which you can get some intuition about the distribution of your data. It also shows if there are any missing values in your dataset.

    Person       Age            Weight          Height        Salary      
 Aditya:1   Min.   :26.00   Min.   :81.00   Min.   :5.8   Min.   : 50000  
 Akshay:1   1st Qu.:26.00   1st Qu.:83.00   1st Qu.:5.9   1st Qu.: 65000  
 Ayush :1   Median :26.00   Median :85.00   Median :6.0   Median : 80000  
            Mean   :26.33   Mean   :85.33   Mean   :6.0   Mean   : 76667  
            3rd Qu.:26.50   3rd Qu.:87.50   3rd Qu.:6.1   3rd Qu.: 90000  
            Max.   :27.00   Max.   :90.00   Max.   :6.2   Max.   :100000  


Congratulations on finishing the tutorial.

This tutorial was a good starting point for beginners who are curious to learn the R programming language. As a good exercise, feel free to check out more helper functions related to each data type.

There is a lot of information related to R that remains unraveled like Conditionals and Control Flow in R, Utilities in R, and the most exciting one Machine Learning using R, which will be covered in the future tutorials, so stay tuned!

Please feel free to ask any questions related to this tutorial in the comments section below.

If you would like to learn more about R, take DataCamp's Intermediate R course and check out the Introduction to Data frames in R tutorial.

Introduction to R

4 hours
Master the basics of data analysis in R, including vectors, lists, and data frames, and practice R with real data sets.
See DetailsRight Arrow
Start Course

Intermediate R

6 hours
Continue your journey to becoming an R ninja by learning about conditional statements, loops, and vector functions.

Introduction to the Tidyverse

4 hours
Get started on the path to exploring and visualizing your own data with the tidyverse, a powerful and popular collection of data science tools within R.
See all coursesRight Arrow
Data Science Concept Vector Image

How to Become a Data Scientist in 8 Steps

Find out everything you need to know about becoming a data scientist, and find out whether it’s the right career for you!
Jose Jorge Rodriguez Salgado's photo

Jose Jorge Rodriguez Salgado

12 min

Predicting FIFA World Cup Qatar 2022 Winners

Learn to use Elo ratings to quantify national soccer team performance, and see how the model can be used to predict the winner of FIFA World Cup Qatar 2022.

Arne Warnke

DC Data in Soccer Infographic.png

How Data Science is Changing Soccer

With the Fifa 2022 World Cup upon us, learn about the most widely used data science use-cases in soccer.
Richie Cotton's photo

Richie Cotton

Regular Expressions Cheat Sheet

Regular expressions (regex or regexp) are a pattern of characters that describe an amount of text. Regular expressions are one of the most widely used tools in natural language processing and allow you to supercharge common text data manipulation tasks. Use this cheat sheet as a handy reminder when working with regular expressions.
DataCamp Team's photo

DataCamp Team

ggplot2 Cheat Sheet

ggplot2 is considered to be one of the most robust data visualization packages in any programming language. Use this cheat sheet to guide your ggplot2 learning journey.
DataCamp Team's photo

DataCamp Team

A Guide to R Regular Expressions

Explore regular expressions in R, why they're important, the tools and functions to work with them, common regex patterns, and how to use them.
Elena Kosourova 's photo

Elena Kosourova

16 min

See MoreSee More