Course
Using Functions in R Tutorial
If you are considering starting a career in data science, the sooner you start coding, the better. No matter the programming language you’ve picked to start your learning journey, at some point you will have a date with functions.
Functions are a central concept in nearly every modern programming language, including popular programming languages in data science, such as Python, Julia, and, obviously, R.
In this tutorial, we will explore what R functions are and how you can use them. By covering the purpose, syntax, and typology of functions available in R, you will get everything you need to master this critical concept in programming. And what’s more, you will be introduced to the art of function creation.
What is a Function in R?
In programming, functions are instructions organized together to carry out a specific task. The rationale behind functions is to create self-contained programs that can be called only when needed.
With functions, programmers no longer need to write a program from scratch, thereby avoiding repetition, and improving code robustness and readability. That’s why, as a rule of thumb, it’s good practice to create a function whenever you expect to run a particular set of instructions more than twice in your code.
Functions can be used for endless purposes and can take various forms. Generically, the vast majority of functions will take input data, process it, and return a result. The data on which the function operates is specified by the so-called arguments, which can also be used to control or alter the way the function carries out the tasks.
Depending on the origin of the function, we can distinguish three main types of functions in R:
- Built-in functions
- Functions available in R packages
- User-Defined functions (UDF)
In the following sections, we will explain the particularities of the different types of functions available in R.
Built-in Functions in R
R is a powerful programming language that comes with a wide catalog of built-in functions that can be called anytime. As a math-oriented language, R comes with a good number of functions to perform numeric operations. Below you can find a list of some of the most useful:
- print(). Displays an R object on the R console
- min(), max(). Calculates the minimum and maximum of a numeric vector
- sum(). Calculates the sum of a numeric vector
- mean(). Calculates the mean of a numeric vector
- range(). Calculates the minimum and maximum values of a numeric vector
- str(). Displays the structure of an R object
- ncol(). Returns the number of columns of a matrix or a dataframe
- length(). Returns the number of items in an R object, such as a vector, a list, and a matrix.
In the code below, you can see how simple is to use these functions to calculate certain statistics from a vector:
>>> v <- c(1, 3, 0.2, 1.5, 1.7)
>>> print(v)
[1] 1.0 3.0 0.2 1.5 1.7
>>> sum(v)
[1] 7.4
>>> mean(v)
[1] 1.48
>>> length(v)
[1] 5
You can cover R functions and more in our comprehensive R skill track, wich will help you learn to code like a programmer.
Functions in R Packages
Yet numerous and diverse, built-in functions are not enough to do all the cool stuff you can do with R, from plotting compelling data visualizations to training powerful machine learning models.
The great majority of functions to perform these tasks are available in external packages or libraries. Packages are collections of R functions, data, and compiled code in a well-defined format created to add specific functionality. Most of these packages can be used for free, and can be found in popular packages repositories, such as CRAN, which currently feature nearly 20,000 contributed packages.
To use the functions available in a package, you first will need to install it. For example, if you want to install stringr, a popular package to work with regular expressions, you can use the following statement:
install.packages('stringr')
Once you have installed it, to load it into your R environment, use the library statement
library(stringr)
Now you’re ready to use all the functions available in the stringr packages. For example, let’s try the str_detect()
function, which returns a logical vector with TRUE
for each element of the string that matches pattern and FALSE
otherwise.
str_detect('DataCamp', "Data")
[1] TRUE
If you’re interested in knowing more about R packages and how to use them, check this DataCamp R packages tutorial.
User-Defined Functions
The best way to understand how functions in R work is by creating your own functions. The so-called User-Defined functions (UDF) are designed by programmers to carry out a specific task.
R functions normally adopt the following syntax:
function_name <- function(argument_1, argument_2) {
function body
return (output)
}
We can distinguish the four main elements:
- Function name. To create a UDF, first you have to assign it a name and save it as a new object. You just have to call the name whenever you want to use the function.
- Arguments. The function arguments (also known as parameters) are provided within the parentheses. Arguments are key for the function to know what data to take as input and/or how to modify the behavior of the function.
- Function body. Within curly brackets comes the body of the function, that is, the instructions to solve a specific task based on the information provided by the arguments.
- Return statement. The return statement is required if you want the function to save as variables the result or results following the operations in the function body.
For example, if you want to create a function that calculates the mean of two numbers:
mean_two_numbers <- function(num_1, num_2) {
mean <- (num_1 + num_2) / 2
return (mean)
}
Now, if you want to calculate the mean of 10 and 20, just call the function as follows:
>>> mean_two_numbers(10,20)
[1] 15
Types of arguments in R functions
Arguments are vital elements in every function. While it’s theoretically possible to write a function with no parameters (see the example below), most functions do have arguments. That makes sense, as arguments tell the function what data to take as input. Equally, if we want to equip a function with various ways of performing a task, arguments will do the job.
hello <- function() {
print('hello, my friend')
}
>>> hello()
[1] "hello, my friend"
There is no limit on the number of arguments in R functions; you can add them in the parentheses separated by commas. Generally, functions with more arguments tend to be more complex.
Once you have created a function with parameters, every time you want to use it, you will have to provide the values of the different parameters; otherwise, R will throw an error. For example, if you don’t provide the values of the two numbers to calculate their mean, our function won’t work.
However, you could avoid this error by using default arguments at the time of defining a function. Default arguments provide a default value that will be used if you call the function without providing that argument. Let’s go back again to our function to calculate the mean of two numbers. This time, we will define it to add a default argument for the second number.
mean_two_numbers <- function(num_1, num_2 = 30) {
mean <- (num_1 + num_2) / 2
return (mean)
}
If we now call the function without providing the value of the num_2
parameter, R will automatically take the default value (i.e., 30):
>>> mean_two_numbers(num_1 = 10)
[1] 20
Understanding Return Values in R Functions
Functions normally take some data as input and give a result as an output. In some programming languages, to save the result of a function as a variable, you need to explicitly include the return statement at the end of the body of the function. Otherwise, the function will only display a value that only exists within the scope of the variable.
This is not the case in R, which will always return a value that can be stored in a variable. However, for the sake of readability, it’s always good practice to include the return statement when defining a function.
mean_two_numbers <- function(num_1, num_2) {
# Function with return
mean <- (num_1 + num_2) / 2
return (mean)
}
mean_two_numbers_2 <- function(num_1, num_2) {
# Function without return
mean <- (num_1 + num_2) / 2
mean
}
> mean_two_numbers(10,50)
[1] 30
> mean_two_numbers_2(10,50)
[1] 30
Finally, if you want your function to return multiple values, you will have to store the different results in a list and include it in the return statement:
mean_sum <- function(num_1, num_2) {
mean <- (num_1 + num_2) / 2
sum <- num_1 + num_2
return (list(mean, sum))
}
>>> mean_sum (10, 20)
[[1]]
[1] 15
[[2]]
[1] 30
Calling Functions in R
In the previous sections, we have already seen various examples on how to call a function. However, it’s important to clarify how R works under the hood when we pass the arguments.
R admits two ways of passing arguments: by position and by name. If we follow the first strategy, we will have to write the values following the same order of arguments as defined in the function.
If we pass the arguments by name, we will need to explicitly specify the names of the arguments and their associated values. Since we have matched arguments and values, the order doesn’t matter.
Finally, it’s also possible to mix the two strategies. In this case, the named arguments are extracted from the list of arguments and are matched first, while the rest of the arguments are matched by position.
hello <- function(name, surname) {
print(paste('Hello', name, surname))
}
# Calling arguments by position
> hello('Greta','Thunberg')
[1] "Hello, Greta Thunberg"
# Calling arguments by name
> hello(surname='Thunberg', name='Greta')
[1] "Hello, Greta Thunberg"
# Calling arguments by position and by name
> hello(surname='Thunberg', 'Greta')
[1] "Hello, Greta Thunberg"
Documenting Functions in R
A good practice when creating functions is to provide documentation on how to use them, especially when functions are complex. An informal way of doing it is by adding comments in the body of functions. You can add the documentation by calling the function without parameters:
hello <- function(name, surname) {
# Say hello to a person with name and surname
print(paste('Hello', name, surname))
}
>>> hello
function(name, surname) {
# Say hello to a person with name and surname
print(paste('Hello,', name, surname))
However, if your function is part of a bigger package and you want to document it, you should write formal documentation in a separate .Rd document. You see the result of this documentation when you look at the help file for a given function, e.g. ?mean.
Conclusion
You made it to the end of the tutorial. Congratulations! Like in many other languages, functions are vital elements in R. Whether built-in, developed as a part of external packages, or even created by you, mastering functions is an important milestone in your programming journey. If you want to keep developing your function in R skills, check out the following resources!
R courses
Course
Introduction to R
Course
Intermediate R
tutorial
R Formula Tutorial
tutorial
Utilities in R Tutorial
tutorial
Factors in R Tutorial
Olivia Smith
5 min
tutorial
RStudio Tutorial
tutorial
Python Functions Tutorial
tutorial
Subsetting in R Tutorial
DataCamp Team
4 min