Skip to main content
HomeAbout RLearn R

Strings in R Tutorial

Learn about R's Strings: its rules, concatenation, along with essential properties of the string, extracting and replacing a character string, and formatting a string.
Apr 2020  · 8 min read

A string is a character that is made of one character or contains a collection of characters. It is enclosed inside single quotes('This is a string') or inside the double quotes("This is also a string"). But in the internal representation of R strings are represented in double-quotes.

In this tutorial, you will learn about strings in R, and we'll cover the following topics.

  1. First, you will look into Strings in R and Rule for String in R.
  2. After that, you'll dive the Concatenation of String in R with other essential properties of String.
  3. Also, Extracting and Replacing a Character String is explained in detail.
  4. Finally, Formatting String, which might be any vector input converted to the well-formatted format.

Rule for String in R

    1. The string that starts with a single quote needs to end with a single quote. However, you can put double quotes, and through the Escape Sequence('\') single quote can also become a part of the string.

For example: 'cars', 'merry"s' , 'merry\'s' are valid string in R.

    1. The string that starts with double-quote needs to end with a double quote. However, you can put single quotes, and through the Escape Sequence('\'), double-quote can also become a part of the string.

For example: "cars" , "merry's", "merry\"s" are also valid string in R.

Concatenation of String

Concatenation of String is making the strings to join or merge.

The syntax for concatenating strings in R is done by:
paste(..., sep = "",collapse = NULL)
paste0(..., collapse = NULL)

The argument above in paste states that:

  1. '...'- Indicates one or more characters or objects which converts into character vectors.
  2. 'sep'- Indicates a separation character
  3. 'collapse'- Indicates an optional separation character.

For example, let's consider the example below where the variables contain the character string: my.var1 = "I" my.var2 = "eat" my.var3 = "rice" print(paste(my.var1,my.var2,my.var3))

The above code gives a single string as below:

"I eat rice"

For example, let's consider the example below where the variables contain the vector converted into a single string:

my.var1 = c('Book','Copy') my.var2 = c('Pen','Chair') print(paste(my.var1,my.var2,sep=" ",collapse="-"))

The above code gives the following output:

"Book Pen-Copy Chair"

Here, the output contains a '-' between Pen and Copy due to the use of a collapse, which makes the separation between two different vectors. In contrast, the sep includes a space(' '), where individual items are separated by it.

Let's consider the example containing paste0()

paste0() takes an argument as sep =,'' i.e. without space and use for its efficiency.

state.name <- paste0('Mary','l','and')
print(state.name)

The output of the above code is below where all characters are merged to be one, and there is no separate space through sep.

"Maryland"

Finding Length

nchar(char,type, allowNA,keepNA) - Finding the total number of characters in a given vector.
nzchar(char)- Returns TRUE if the x is non-empty and FALSE if it contains any string.

The argument above in paste states that:

  1. char- Indicates a character vector.
  2. type- Indicates a one of the three types, i.e, "width","chars","bytes".
  3. allowNA- Indicates a Logical default value as FALSE and incase the result cannot be calculated will not be an error but returns NA.
  4. keepNA- Indicates a Logical default value as FALSE where it will map the missing value to 2 and nzchar() will give TRUE, but both will map to TRUE as well in the case of the missing values.

In the example below to determine the length of a character vector. my.char = nchar("hello") print(my.char)

The above example returns '5', as the output because there are five characters present in the vector.

For the example of using nzchar() let's look at the code below:

new.char = nzchar("monkey") print(new.char)

The above code gives the logical value 'TRUE' because the 'new.char' is not a empty string.

my.char = nzchar("") print(my.char)

The above code gives the logical value 'FALSE' because the 'my.char' is a empty string.

Changing to Upper Case and Lower Case

toupper(char) - Changes all the characters present to uppercase.
tolower(char) - Changes all the characters present to lowercase.

Let's look at the example for changing into uppercase. my.var = toupper("I'm changed to upper case") print(my.var)

The above code gives the output as below where each character present in the variable 'my.var' changed into uppercase.

"I'M CHANGED TO UPPER CASE"

Let's look at the example for changing into lowercase. my.var = tolower("I'm changed to lower case") print(my.var)

The above code gives the output as below where each character present in the variable 'my.var' changed into lowercase.

""i'm changed to lower case""

Extracting and Replacing a Character String

substr(char, start, stop): Extracting and Replacing of a character string.

The argument above states that:

  1. char- Indicates a character string.
  2. start - Indicates an integer that specifies the corresponding starting value to be returned.
  3. stop - Indicates an integer that specifies the corresponding stopping value to be returned.

Let's see another example where the characters get extracted.

final.value <- substring('Remaining', 3, 9)
print(final.value)

In the above code,'Remaining' is a string where the first index is 'R', which is indexed as integer 1. The last character is g with its index as integer 9.

The above code shows the output below where starting value as integer 3, which has the corresponding character as ' m ' in 'Remaining'.Similarly, other characters 'ainin' up to integer 9, which is inclusive with the corresponding character as 'g', gets printed out.

"maining"

Let's see another example where the characters get replaced.

y = "Remaining"
substring(y, 3, 5) <- "der"
print(y)

The above code shows the output where the starting integer value from 3 to 5 with corresponding character value in y gets replaced by 'der'.

"Rederning"

substring(char, first, last = 1000000L): Extracting or Replacing of a character string.
The argument above states that:

  1. char- Indicates a character string.
  2. first - Indicates an integer which specifies the corresponding first value to be returned.
  3. last - Indicates an integer which specifies the corresponding last value to be returned and default value to 1000000L.
my.value <- substring('Remaining', 3, 8)
print(my.value)

The above code shows the output below where starting value as integer 3, which has the corresponding character as ' m ' in 'Remaining'.Similarly, other characters up to integer 8, which is inclusive with the corresponding character as 'n' get printed out.

"mainin"

Let's see another example where the characters get replaced.

y = "newcharacter"
substring(y, 6,9) <- "???"
print(y)

The above code shows the output where the starting integer value from 6 to 9 with corresponding character value in y gets replaced by '???'.

"newch???cter"

Formatting String

format(char,width,scientific,justify,nsmall,digits): It converts a vector of integer and character to a commonly used format.

The argument above states that:

  1. char - can be a vector of integers or characters.
  2. width - indicates a minimum width of string produced.
  3. scientific - default as FALSE but use TRUE for scientific notation.
  4. justify - default to none but can display strings to "right", "left", and "center".
  5. nsmall - indicates the total number after the decimal place
  6. digits - indicates the total number before and after decimal place.

Let's take an example for formatting string for scientific notation.

my.var <- format(c(2, 19.267), scientific = TRUE)
print(my.var)

The above code shows the output, which is expressed in scientific or exponential notation

"2.0000e+00" "1.9267e+01"

The output "1.9267e+01" = 1.9267 x 10(to the 1 power) = 1.9267x10 = 19.267.

Let's take an example of formatting using the nsmall.

my.var <- format(92.56656577, nsmall = 6)
print(my.var)

The above code gives the output as below where the 6 numbers after decimal place are included.

"92.566565"

Let's take an example of formatting using the digits.

my.var <- format(102.848793834, digits = 7)
print(my.var)

The above code gives the output as below where the number before starting decimal place are included up to the number after the decimal place.

"102.8488"

Seven numbers are included as the output where the last digit needs to be 7, but due to the preceding number, i.e., 9 is greater than 7, it gets increased to 8. In simple words, round-off takes place.

Let's take an example for formatting using the right justify.

my.var <- format('Running', justify = 'r',width = 9)
print(my.var)

The above code gives the output as below where the character 'Running' is justified to the right with some space of width.

"  Running"

Let's take an example of formatting using the justify character in the center.

my.var <- format('Running', justify = 'c',width = 12)
print(my.var)

The above code gives the output as below where the character 'Running' is justified to center with some space of width between right and left.

"  Running   "

Congratulations

Congratulations, you have made it to the end of this tutorial!

You've learned about R's Strings: its rules, concatenation, along with essential properties of the string, Extracting and Replacing a Character String, and formatting a string.

If you would like to learn more about R, take DataCamp's Introduction to R course.

Check out our Data Types in R tutorial.

Topics

R Courses

Course

Introduction to R

4 hr
2.7M
Master the basics of data analysis in R, including vectors, lists, and data frames, and practice R with real data sets.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

Data Science in Finance: Unlocking New Potentials in Financial Markets

Discover the role of data science in finance, shaping tomorrow's financial strategies. Gain insights into advanced analytics and investment trends.

Shawn Plummer

9 min

5 Common Data Science Challenges and Effective Solutions

Emerging technologies are changing the data science world, bringing new data science challenges to businesses. Here are 5 data science challenges and solutions.
DataCamp Team's photo

DataCamp Team

8 min

Navigating R Certifications in 2024: A Comprehensive Guide

Explore DataCamp's R programming certifications with our guide. Learn about Data Scientist and Data Analyst paths, preparation tips, and career advancement.
Matt Crabtree's photo

Matt Crabtree

8 min

A Data Science Roadmap for 2024

Do you want to start or grow in the field of data science? This data science roadmap helps you understand and get started in the data science landscape.
Mark Graus's photo

Mark Graus

10 min

R Markdown Tutorial for Beginners

Learn what R Markdown is, what it's used for, how to install it, what capacities it provides for working with code, text, and plots, what syntax it uses, what output formats it supports, and how to render and publish R Markdown documents.
Elena Kosourova 's photo

Elena Kosourova

12 min

Introduction to DynamoDB: Mastering NoSQL Database with Node.js | A Beginner's Tutorial

Learn to master DynamoDB with Node.js in this beginner's guide. Explore table creation, CRUD operations, and scalability in AWS's NoSQL database.
Gary Alway's photo

Gary Alway

11 min

See MoreSee More