Tutorials
r programming
+1

A Tutorial on Using Functions in R!

The tutorial highlights what R functions are, user defined functions in R, scoping in R, making your own functions in R, and much more.

In a previous post, you covered part of the R language control flow, the cycles or loop structures. In a subsequent one, you learned more on how to avoid looping by using the apply() family of functions, which act on compound data in repetitive ways. This post will introduce you to the notion of function from the R programmer point of view and will illustrate the range of action that functions have within the R code.

The post will cover the following topics:


(To practice, try DataCamp's Writing Functions in R course.)

What Is A Function?

In programming, you use functions to incorporate sets of instructions that you want to use repeatedly or that, because of their complexity, are better self-contained in a sub program and called when needed. A function is a piece of code written to carry out a specified task; it can or can not accept arguments or parameters and it can or can not return one or more values.

Now then how generic is that!

In fact, there are several possible formal definitions of ‘function’ spanning from mathematics to computer science. Generically, its arguments constitute the input and their return values their output.

Here, you’ll use a simple definition dropping the math restriction that “the property that each input is related to exactly one output”. You will see that there are functions that operate on some of the input values, perhaps giving multiple results, depending on how they are internally constructed.

Functions in R

There exist a number of terms to define and express functions, subroutines, procedures, method etc., but for the purposes of this post, you will ignore this distinction, which is often semantic and reminiscent of other older programming languages. You’ll denote each of those constructs generically as ‘functions’, especially because in R we just have…functions!

(For the horrified reader, here’s a link: semantics).

In R, according to the base docs, you define a function with the construct

function (arglist)  {body}

where the code in between the curly braces is the body of the function.

Note that by using built-in functions, the only thing you need to worry about is how to effectively communicate the correct input arguments (arglist) and manage the return value(s), if there are any.

What Are The Most Popular Functions in R?

Now, given the enormous number of functions and libraries in R, how do you orient yourself to decide which are the ones to learn and master? And because many functions appear in distinct packages (libraries), shouldn’t you also know which libraries to use?

Tip: learn more about the difference between R packages and libraries in DataCamp’s Beginner’s Guide to R Packages.

Resorting to data science, you see that somebody has already considered this:


So up to this point, you’ve only learned that the are a lot of R functions organized in a multitude of packages and the hardest job is to correctly determine which parameters to pass (the arguments or args), and how to handle their return values.

So, the best way to learn more about the inner workings of functions, is to write your own ones.

User Defined Functions (UDF)

Whether you need to accomplish a particular task and are not aware that a dedicated function or library exists already or because by the time you spend googling for some existing solution, you can have already come out with your own, you will find yourself at some time typing something like:

function.name <- function(arguments) 
{
  computations on the arguments
  some other code
}

So, in most cases, a function has a name, some arguments used as input to the function, within the () following the keyword ‘function’; a body, which is the code within the curly braces {}, where you carry out the computation; and can have one or more return values (the output). You define the function similarly to variables, by “assigning” the directive function(arguments) to the “variable” function.name, followed by the rest.

Remember that there are also functions that don’t carry names; These are called “anonymous functions”.

Make sure that the name that you choose for the function is not an R reserved word. This means that you, for example, don’t want to pick the name of an existing function for your own UDF, because it can cause you a lot of headaches since R will not know whether you mean your recently defined UDF or the existing function that was loaded with one of the libraries.

One of the ways to avoid this is by using the help system: if you get some information back by entering {r eval=FALSE} ? OurFunctionName, you know it is better not to use that name, because it has already been taken.

Note that it’s still possible to take the name of an existing function for your own UDF but that it’s not recommended; It will require you to hide the one function from the other!

Once you have defined the function in a function definition, you can call or use it somewhere else in the code. You can easily spot this in the following piece of code, where you define a function that computes the square of the argument and then call it after assigning a value for its argument:

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIERlZmluZSBhIHNpbXBsZSBmdW5jdGlvblxubXlGaXJzdEZ1bjwtZnVuY3Rpb24obilcbntcbiAgIyBDb21wdXRlIHRoZSBzcXVhcmUgb2YgaW50ZWdlciBgbmBcbiAgbipuICAgXG59XG5cbiMgQXNzaWduIGAxMGAgdG8gYGtgXG5rIDwtIDEwXG5cbiMgQ2FsbCBgbXlGaXJzdEZ1bmAgd2l0aCB0aGF0IHZhbHVlXG5tIDwtIC4uLi4uLi4uLi4uLi4uXG5cbiMgQ2FsbCBgbWAgXG4uIiwic29sdXRpb24iOiIjIERlZmluZSBhIHNpbXBsZSBmdW5jdGlvblxubXlGaXJzdEZ1bjwtZnVuY3Rpb24obilcbntcbiAgIyBDb21wdXRlIHRoZSBzcXVhcmUgb2YgaW50ZWdlciBgbmBcbiAgbipuICAgXG59XG5cbiMgQXNzaWduIGAxMGAgdG8gYGtgXG5rIDwtIDEwXG5cbiMgQ2FsbCBgbXlGaXJzdEZ1bmAgd2l0aCB0aGF0IHZhbHVlXG5tIDwtIG15Rmlyc3RGdW4oaylcblxuIyBDYWxsIGBtYCBcbm0iLCJzY3QiOiJtc2cgPC0gXCJEb3VibGUgY2hlY2sgaG93IHlvdSdyZSBkZWZpbmluZyB0aGUgYG15Rmlyc3RGdW4oKWAgZnVuY3Rpb24uXCJcbm1zZ191bmRlZmluZWQgPC0gXCJNYWtlIHN1cmUgdG8gZGVmaW5lIGEgdmFyaWFibGUgYG1gLlwiXG5tc2dfaW5jb3JyZWN0IDwtIFwiTWFrZSBzdXJlIHRoYXQgeW91IGFzc2lnbiB0aGUgY29ycmVjdCB2YWx1ZSB0byBgbWAuXCJcbm1zZ191bmRlZmluZWQyIDwtIFwiTWFrZSBzdXJlIHRvIGRlZmluZSBhIHZhcmlhYmxlIGBrYC5cIlxubXNnX2luY29ycmVjdDIgPC0gXCJNYWtlIHN1cmUgdGhhdCB5b3UgYXNzaWduIHRoZSBjb3JyZWN0IHZhbHVlIHRvIGBrYC5cIlxuXG5mdW5fZGVmIDwtIGV4KCkgJT4lIGNoZWNrX2Z1bl9kZWYoXCJteUZpcnN0RnVuXCIpXG5mdW5fZGVmICU+JSBjaGVja19hcmd1bWVudHMoKVxuZnVuX2RlZiAlPiUgY2hlY2tfY2FsbChrKSAlPiUgY2hlY2tfcmVzdWx0KG1zZylcbmZ1bl9kZWYgJT4lIGNoZWNrX2JvZHkoKVxuZXgoKSAlPiUgY2hlY2tfb2JqZWN0KCdrJywgbXNnX3VuZGVmaW5lZDIpICU+JSBjaGVja19lcXVhbChtc2dfaW5jb3JyZWN0MilcbmV4KCkgJT4lIGNoZWNrX29iamVjdCgnbScsIG1zZ191bmRlZmluZWQpICU+JSBjaGVja19lcXVhbChtc2dfaW5jb3JyZWN0KVxuZXgoKSAlPiUgY2hlY2tfb3V0cHV0X2V4cHIoXCJtXCIpXG5cbnN1Y2Nlc3NfbXNnKFwiV2VsbCBkb25lIVwiKSJ9


A few comments are necessary to illustrate the working of UDFs:

  • You first define the function as a variable, myFirstFun, using the keyword function, which also receives n as argument (no type specification). The latter will exist within the function. You used an integer, but n could also be a vector or a matrix or a string: R handles all this nicely for you;
  • In your snippet, when you call the function, you assign it to a variable m. This is not necessary per se, because R will always print the last evaluation done, but you do this for clarity and perhaps because you want to re-use the result later. If you don’t, however, R will have forgotten this evaluation by the time the next command is run;
  • When you call the function, you can use an arbitrary variable, such as k in the code chunk above, to which you assign an integer value. You do this to illustrate that the variable does not need to have the same name (or the same type) because it is a different object; This means that
  • You could have used the same name, n; You’ll read more about this later!

Note, however, that this n is not the same we used within the function body. In fact, if you do the following:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6Im15Rmlyc3RGdW48LWZ1bmN0aW9uKG4pe24qbn1cbmsgPC0gMTBcbm0gPC0gbXlGaXJzdEZ1bihrKSIsInNhbXBsZSI6IiMgQXNzaWduIGAxMmAgdG8gYG5gXG5uIDwtIDEyXG5cbiMgQXNzaWduIGBteUZpcnN0RnVuKG4pYCB0byBgbWBcbm0gPC0gLi4uLi4uLi4uLi4uLi4uLlxuXG4jIFByaW50IGBrYFxucHJpbnQoLilcblxuIyBQcmludCBgbWBcbnByaW50KC4pXG5cbiMgUHJpbnQgYG5gXG5wcmludCguKSIsInNvbHV0aW9uIjoiIyBBc3NpZ24gYDEyYCB0byBgbmBcbm4gPC0gMTJcblxuIyBBc3NpZ24gYG15Rmlyc3RGdW4obilgIHRvIGBtYFxubSA8LSBteUZpcnN0RnVuKG4pXG5cbiMgUHJpbnQgYGtgXG5wcmludChrKVxuXG4jIFByaW50IGBtYFxucHJpbnQobSlcblxuIyBQcmludCBgbmBcbnByaW50KG4pIiwic2N0IjoibXNnX3VuZGVmaW5lZCA8LSBcIk1ha2Ugc3VyZSB0byBkZWZpbmUgYSB2YXJpYWJsZSBgbmAuXCJcbm1zZ19pbmNvcnJlY3QgPC0gXCJNYWtlIHN1cmUgdGhhdCB5b3UgYXNzaWduIHRoZSBjb3JyZWN0IHZhbHVlIHRvIGBuYC5cIlxubXNnX3VuZGVmaW5lZDIgPC0gXCJNYWtlIHN1cmUgdG8gZGVmaW5lIGEgdmFyaWFibGUgYG1gLlwiXG5tc2dfaW5jb3JyZWN0MiA8LSBcIk1ha2Ugc3VyZSB0aGF0IHlvdSBhc3NpZ24gdGhlIGNvcnJlY3QgdmFsdWUgdG8gYG1gLlwiXG5cbmV4KCkgJT4lIGNoZWNrX29iamVjdCgnbicsIG1zZ191bmRlZmluZWQpICU+JSBjaGVja19lcXVhbChtc2dfaW5jb3JyZWN0KVxuZXgoKSAlPiUgY2hlY2tfb2JqZWN0KCdtJywgbXNnX3VuZGVmaW5lZDIpICU+JSBjaGVja19lcXVhbChtc2dfaW5jb3JyZWN0MilcbmV4KCkgJT4lIGNoZWNrX2Z1bmN0aW9uKCdwcmludCcsIGluZGV4PTEpICU+JSBjaGVja19yZXN1bHQoKSAlPiUgY2hlY2tfZXF1YWwoKVxuZXgoKSAlPiUgY2hlY2tfZnVuY3Rpb24oJ3ByaW50JywgaW5kZXg9MikgJT4lIGNoZWNrX3Jlc3VsdCgpICU+JSBjaGVja19lcXVhbCgpXG5leCgpICU+JSBjaGVja19mdW5jdGlvbigncHJpbnQnLCBpbmRleD0zKSAlPiUgY2hlY2tfcmVzdWx0KCkgJT4lIGNoZWNrX2VxdWFsKClcblxuc3VjY2Vzc19tc2coXCJDb29sIVwiKSJ9


You’ll see that k and n remain at their initially defined value.

Actually, if you hadn’t defined the variable n before the last call, R would have thrown you an error, like this:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6Im15Rmlyc3RGdW48LWZ1bmN0aW9uKG4pe24qbn1cbmsgPC0gMTBcbm0gPC0gbXlGaXJzdEZ1bihrKSIsInNhbXBsZSI6Im15Rmlyc3RGdW48LWZ1bmN0aW9uKG4pXG57XG4gICMgQ29tcHV0ZSB0aGUgc3F1YXJlIG9mIGludGVnZXIgYG5gIFxuICBuKm4gIFxufVxuXG4jIENhbGwgdGhlIGZ1bmN0aW9uIHdpdGggYXJndW1lbnQgYG5gXG51IDwtIG15Rmlyc3RGdW4obilcblxuIyBDYWxsIGB1YFxudSJ9


Tip: before trying the code chunk above, remember to clean the workspace: if you work in RStudio, click the brush in the environment window, or uncomment the first line in the following snippet, or else R will remember the previous values. You can do this by first running rm(list=ls()).

Alternatively, to remove specific elements from the workspace, you can use the function rm(x,y,z...) to remove the objects x,y,z from the environment. These can be variables, datasets, and also functions.

You should get the following error: "Error in myFirstFun(n) : object 'n' not found"". Note that any other previously undefined variable would cause the same error. This is because R performs a lazy evaluation: it checks only when needed at execution. So if you had defined the argument as a character, k='a', you would get the error: Error in n * n : non-numeric argument to binary operator.

This also means that, if you would have defined a second argument without passing a value for it, R would have complained only when necessary, for example, where a reference to it is made, without a value being provided. You’ll read more about this in the section about arguments.

So you have seen a first example of “scoping” or the visibility of variables.


Functions in R - Scoping


As shown in the figure above, an important feature of functions is that the variables used within are local. This means that, for example, their scope lies within -and is limited to- the function itself and are therefore invisible outside the function body.

Clearly, functions need a way to communicate to the external world, typically the piece of code that calls them, by means of one or more arguments (the ‘input’) and one or more values that the function returns to the caller (the ‘output’).

In your example, the function return value is contained in the variable m. Note that because all the objects within the function are local they will not show up in your workspace. To make them accessible externally to the function body, you need to use return.

Thus, you can say that environments in R are nested; They are organized as a tree structure which reflects the way R operates when it encounters a symbol. R starts bottom up: when a symbol is not found in the current function environment, it looks up the next level up to the global environment. Eventually, if the symbol is not found, R will give an error.

This is the case when trying to intercept a variable defined within a function, for example when debugging; if a symbol with the same name exist in the script environment it is displayed however, it is not the variable within the function: this remains invisible to the RStudio environment.

So, to inspect a variable within a function, a print statement can help.

How Can You See Your R Function in RStudio?

When you develop your function and you can see it in the RStudio environment. An easy way to visualize its code is to type its name without the parentheses ().

When you exit Rstudio without closing the function script file, and you saved your environment upon exit, you’ll find it again in your workspace among the script files that may have been there once you exited.

However, during the development of a slightly larger project, it is very likely that you wrote your function as an R script and saved it somewhere.

Calling R Functions Defined In Other Scripts

Maybe you planned a library of utility functions and wish to call one or more of these from another script that you are developing. How does this work?

First, note the simple way in which a function is loaded and executed in R. This might not be visible in the Rstudio console, but it is in any R console. If the function code snippet myFirstFun seen above was saved into an R script file, say myIndepFun.R you can load the function with the command source():

source("myIndepFun.R")

And this command also works from a script.

However, you might want to find a specific function, such as myFirstFun, within a script file MyUtils.R, which contains other utility functions.

In this case, the ‘source’ command will load the function once you’ve found it with the call to the function exists():

if(exists("myFirstFun", mode = "function"))
    source("MyUtils.R")

If misspell or forgot how you called your file, you can use sapply() to retrieve a list of filenames with extension .R, with their full name, from your directory, say “/R/MyFiles”, and of course load them:

sapply(list.files(pattern="[.]R$", path="R/MyFiles/", full.names=TRUE), source);

Nested Function Calls in R

The return statement is not required in a function but it is advisable to use it when the function performs several computations or when you want the value (and not the object that contains it!) to be accessible outside of the function body. As you have seen, the latter is not the default behavior.

Note that as the name says, it has the effect of ending the function execution and return control to the code which called it.

Now consider the arguments: these can be of any type and can have default values inside the function. The latter provides an output even when explicit values are not passed to it. Finally, you can call another function within a function.

Let’s see these points in detail through the following examples.

First you define a vector v that you will use in the following:

# Define a numeric vector `v` of 4 elements
v <- c(1, 3, 0.2, 1.5, 1.7)

# Define a matrix `M`
M <- cbind( c(0.2, 0.9, 1), c(1.0, 5.1, 1), c(6, 0.2, 1), c(2.0, 9, 1))

Then you show an example of a function calling the first function that you made above. Note that you can pass one argument only in the call, even if the function was defined with two arguments. This time you also use return():

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6InYgPC0gYygxLCAzLCAwLjIsIDEuNSwgMS43KVxubXlGaXJzdEZ1bjwtZnVuY3Rpb24obil7bipufSIsInNhbXBsZSI6IiMgUGFzc2luZyBvbmx5IDEgYXJndW1lbnQsIG5lc3RlZCBjYWxsIGFuZCByZXR1cm5cbm15U2VjRnVuPC1mdW5jdGlvbih2LE0pXG57XG4gICMgQ29tcHV0ZSB0aGUgc3F1YXJlIG9mIGVhY2ggZWxlbWVudCBvZiB2IGludG8gdVxuICB1PWMoMCwwLDAsMClcbiAgZm9yKGkgaW4gMTpsZW5ndGgodikpXG4gICAgeyBcbiAgICAgIHVbaV09bXlGaXJzdEZ1bih2W2ldKTtcbiAgICB9XG4gIHJldHVybih1KVxufVxuXG4jIEFzc2lnbiBgbXlTZWNGdW4odilgIHRvIGBTcXZgXG5TcXYgPC0gLi4uLi4uLi4uLi4uLlxuXG4jIENhbGwgYFNxdmBcbi4uLiIsInNvbHV0aW9uIjoiIyBQYXNzaW5nIG9ubHkgMSBhcmd1bWVudCwgbmVzdGVkIGNhbGwgYW5kIHJldHVyblxubXlTZWNGdW48LWZ1bmN0aW9uKHYsTSlcbntcbiAgIyBDb21wdXRlIHRoZSBzcXVhcmUgb2YgZWFjaCBlbGVtZW50IG9mIHYgaW50byB1XG4gIHU9YygwLDAsMCwwKVxuICBmb3IoaSBpbiAxOmxlbmd0aCh2KSlcbiAgICB7IFxuICAgICAgdVtpXT1teUZpcnN0RnVuKHZbaV0pO1xuICAgIH1cbiAgcmV0dXJuKHUpXG59XG5cbiMgQXNzaWduIGBteVNlY0Z1bih2KWAgdG8gYFNxdmBcblNxdiA8LSBteVNlY0Z1bih2KVxuXG4jIENhbGwgYFNxdmBcblNxdiIsInNjdCI6Im1zZyA8LSBcIkRvdWJsZSBjaGVjayBob3cgeW91J3JlIGRlZmluaW5nIHRoZSBgbXlTZWNGdW4oKWAgZnVuY3Rpb24uXCJcbm1zZ191bmRlZmluZWQgPC0gXCJNYWtlIHN1cmUgdG8gZGVmaW5lIGEgdmFyaWFibGUgYFNxdmAuXCJcbm1zZ19pbmNvcnJlY3QgPC0gXCJNYWtlIHN1cmUgdGhhdCB5b3UgYXNzaWduIHRoZSBjb3JyZWN0IHZhbHVlIHRvIGBTcXZgLlwiXG5cbmZ1bl9kZWYgPC0gZXgoKSAlPiUgY2hlY2tfZnVuX2RlZihcIm15U2VjRnVuXCIpXG5mdW5fZGVmICU+JSBjaGVja19hcmd1bWVudHMoKVxuZnVuX2RlZiAlPiUgY2hlY2tfY2FsbCh2KSAlPiUgY2hlY2tfcmVzdWx0KG1zZylcbmZ1bl9kZWYgJT4lIGNoZWNrX2JvZHkoKVxuZXgoKSAlPiUgY2hlY2tfb2JqZWN0KCdTcXYnLCBtc2dfdW5kZWZpbmVkKSAlPiUgY2hlY2tfZXF1YWwobXNnX2luY29ycmVjdClcbmV4KCkgJT4lIGNoZWNrX291dHB1dF9leHByKFwiU3F2XCIpXG5cbnN1Y2Nlc3NfbXNnKFwiQXdlc29tZSFcIikifQ==


If you forget the latter, like in the code chunk below, you will never be able to access the output.

In fact, as shown by the last command, the output is NULL, simply because even if the internal function return values fill the vector u, the latter remain confined within the second function because it does not return any value!

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6InYgPC0gYygxLCAzLCAwLjIsIDEuNSwgMS43KVxubXlGaXJzdEZ1bjwtZnVuY3Rpb24obil7bipufSIsInNhbXBsZSI6IiMgUGFzc2luZyBvbmx5IDEgYXJndW1lbnQsIG5lc3RlZCBjYWxsIGFuZCBubyByZXR1cm46IG91dHB1dCB1bmFjY2Vzc2libGVcbm15U2VjRnVuPC1mdW5jdGlvbih2LE0pXG57XG4gICMgQ29tcHV0ZSB0aGUgc3F1YXJlIG9mIGVhY2ggZWxlbWVudCBvZiB2IGludG8gdVxuICB1PWMoMCwwLDAsMClcbiAgZm9yKGkgaW4gMTpsZW5ndGgodikpXG4gIHsgXG4gICAgIyBDYWxsIG91ciBmaXJzdCBmdW5jdGlvblxuICAgIHVbaV09bXlGaXJzdEZ1bih2W2ldKSBcbiAgfVxufVxuXG4jIEFzc2lnbiBgbXlTZWNGdW4odilgIHRvIGBTcXZgXG5TcXYgPC0gLi4uLi4uLi4uLi4uLiAgXG5cbiMgQ2FsbCBgU3F2YFxuLi4uIiwic29sdXRpb24iOiIjIFBhc3Npbmcgb25seSAxIGFyZ3VtZW50LCBuZXN0ZWQgY2FsbCBhbmQgbm8gcmV0dXJuOiBvdXRwdXQgdW5hY2Nlc3NpYmxlXG5teVNlY0Z1bjwtZnVuY3Rpb24odixNKVxue1xuICAjIENvbXB1dGUgdGhlIHNxdWFyZSBvZiBlYWNoIGVsZW1lbnQgb2YgdiBpbnRvIHVcbiAgdT1jKDAsMCwwLDApXG4gIGZvcihpIGluIDE6bGVuZ3RoKHYpKVxuICB7IFxuICAgICMgQ2FsbCBvdXIgZmlyc3QgZnVuY3Rpb25cbiAgICB1W2ldPW15Rmlyc3RGdW4odltpXSkgXG4gIH1cbn1cblxuIyBBc3NpZ24gYG15U2VjRnVuKHYpYCB0byBgU3F2YFxuU3F2IDwtIG15U2VjRnVuKHYpICBcblxuIyBDYWxsIGBTcXZgXG5TcXYiLCJzY3QiOiJtc2cgPC0gXCJEb3VibGUgY2hlY2sgaG93IHlvdSdyZSBkZWZpbmluZyB0aGUgYG15U2VjRnVuKClgIGZ1bmN0aW9uLlwiXG5tc2dfdW5kZWZpbmVkIDwtIFwiTWFrZSBzdXJlIHRvIGRlZmluZSBhIHZhcmlhYmxlIGBTcXZgLlwiXG5tc2dfaW5jb3JyZWN0IDwtIFwiTWFrZSBzdXJlIHRoYXQgeW91IGFzc2lnbiB0aGUgY29ycmVjdCB2YWx1ZSB0byBgU3F2YC5cIlxuXG5mdW5fZGVmIDwtIGV4KCkgJT4lIGNoZWNrX2Z1bl9kZWYoXCJteVNlY0Z1blwiKVxuZnVuX2RlZiAlPiUgY2hlY2tfYXJndW1lbnRzKClcbmZ1bl9kZWYgJT4lIGNoZWNrX2NhbGwodikgJT4lIGNoZWNrX3Jlc3VsdChtc2cpXG5mdW5fZGVmICU+JSBjaGVja19ib2R5KClcbmV4KCkgJT4lIGNoZWNrX29iamVjdCgnU3F2JywgbXNnX3VuZGVmaW5lZCkgJT4lIGNoZWNrX2VxdWFsKG1zZ19pbmNvcnJlY3QpXG5leCgpICU+JSBjaGVja19vdXRwdXRfZXhwcihcIlNxdlwiKVxuXG5zdWNjZXNzX21zZyhcIkdvb2Qgam9iIVwiKSJ9


Function Arguments And Their Default

You have seen that function arguments are specified within the (). Let’s see a sequence of examples to compute some power of a value n passed as an argument, with few variations on arguments management:

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIERlZmluZSB0aGUgZnVuY3Rpb24gYW5kIHNwZWNpZnkgdGhlIGV4cG9uZW50LCBzZWNvbmQgYXJndW1lbnQgZGlyZWN0bHlcbiMgU2V0cyBkZWZhdWx0IG9mIGV4cG9uZW50IHRvIDIgKGp1c3Qgc3F1YXJlKVxuTXlUaGlyZEZ1biA8LSBmdW5jdGlvbihuLCB5ID0gMikgXG57XG4gICMgQ29tcHV0ZSB0aGUgcG93ZXIgb2YgbiB0byB0aGUgeVxuICBuXnkgIFxufVxuXG4jIFNwZWNpZnkgYm90aCBhcmdzXG5NeVRoaXJkRnVuKDIsMykgXG5cbiMgSnVzdCBzcGVjaWZ5IHRoZSBmaXJzdCBhcmdcbk15VGhpcmRGdW4oMikgICBcblxuIyBTcGVjaWZ5IG5vIGFyZ3VtZW50OiBlcnJvciFcbiMgTXlUaGlyZEZ1bigpICAgICJ9


In this case, you see that if you specify both arguments, the function just computes 2^3=8. When you pass only the first, our n, the function uses the default y=2, to carry out the computation. If you omit the arguments, R throws an error. Uncomment the line to see the error!

Here, you specify the second argument, your exponent as a list of values, to compute the powers of the given n with exponent less or equal to 1:

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjICB3aXRoIHZhcmlhYmxlIGV4cG9uZW50IGZyb20gMC4wNSB0byAxIGluIHN0ZXBzIG9mIDAuMDFcbk15VGhpcmRGdW4gPC0gZnVuY3Rpb24obiwgeSA9IHNlcSgwLjA1LCAxLCBieSA9IDAuMDEpKVxue1xuICAjIENvbXB1dGUgdGhlIHBvd2VyIG9mIGBuYCB0byB0aGUgYHlgXG4gIG5eeSAgXG59XG5cbiMgQXMgYmVmb3JlLCBzcGVjaWZ5IGJvdGggYXJnc1xuTXlUaGlyZEZ1bigyLDMpXG5cbiMgQ29tcHV0ZSBhbGwgcG9zc2libGUgYWNjb3JkaW5nIHRvIGdpdmVuIGRlZmF1bHRcbk15VGhpcmRGdW4oMikgIFxuXG4jIFNwZWNpZnkgbm8gYXJndW1lbnRzOiBlcnJvciFcbiMgTXlUaGlyZEZ1bigpICAgICJ9


Here, specifying just n (2 in the snippet) causes the function to compute all the powers according to the list of exponents specified.

The following is equivalent: here you did not default the values as above, but check its existence with an if test on the argument via the function missing():

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIEVxdWl2YWxlbnQgYWx0ZXJuYXRpdmU6XG5NeUZvdXJ0aEZ1biA8LSBmdW5jdGlvbihuLCB5KSAgXG4geyBpZihtaXNzaW5nKHkpKVxuICAge1xuICAgIHkgPC0gc2VxKDAuMDUsIDEsIGJ5ID0gMC4wMSlcbiB9XG4gcmV0dXJuKG5eeSlcbn1cblxuTXlGb3VydGhGdW4oMiwzKVxuXG4jIENvbXB1dGUgYWxsIHBvc3NpYmxlIGFjY29yZGluZyB0byBnaXZlbiBkZWZhdWx0XG5NeUZvdXJ0aEZ1bigyKSAgXG5cbiMgU3BlY2lmeSBubyBhcmd1bWVudDogZXJyb3IhXG4jIE15Rm91cnRoRnVuKCkgIn0=


Ok, but you can do better! Use the default list as a checker for the user input, that is to validate the input. The MyFourthFun function below checks if y value is within the list: if it is, the code will perform the power, else it will do the default or throw an error:

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJNeUZvdXJ0aEZ1biA8LSBmdW5jdGlvbihuLCB5KSAgXG57ICBcbiAgIyBVbmNvbW1lbnQgYHByaW50KClgIGNhbGxzIHRvIGNoZWNrIHBhc3NlZCB2YWx1ZXNcbiAgIyBwcmludChuKVxuICAjIHByaW50KHkpXG4gIGlmKG1pc3NpbmcobikpIG49MjtcbiAgaWYobWlzc2luZyh5KSkgeT0wLjA1O1xuICBpZigheSAlaW4lIHNlcSgwLjA1LCAxLCBieSA9IDAuMDIpKSBwcmludChcInZhbHVlIG11c3QgYmUgPD0gMSlcIilcbiAgZWxzZSByZXR1cm4obl55KVxufVxuXG4jIENhbGN1bGF0aW9uIHdpbGwgYmUgY2FycmllZCBvdXRcbk15Rm91cnRoRnVuKDIsMC4wNylcblxuIyBQcmludHMgYW4gZXJyb3IsIGB5YCBpcyBub3QgaW4gdGhlIGFsbG93ZWQgbGlzdCBcbk15Rm91cnRoRnVuKDIsMylcblxuIyBVc2UgYHlgIGRlZmF1bHRcbk15Rm91cnRoRnVuKDIpICAgXG5cbiMgTm8gYXJndW1lbnRzOiBib3RoIGBuYCBhbmQgYHlgIGRlZmF1bHRzIGFyZSB1c2VkXG5NeUZvdXJ0aEZ1bigpICAgICAiLCJzb2x1dGlvbiI6Ik15Rm91cnRoRnVuIDwtIGZ1bmN0aW9uKG4sIHkpICBcbnsgXG4gICMgVW5jb21tZW50IGBwcmludCgpYCBjYWxscyB0byBjaGVjayBwYXNzZWQgdmFsdWVzXG4gICMgcHJpbnQobilcbiAgIyBwcmludCh5KVxuICBpZihtaXNzaW5nKG4pKSBuPTI7XG4gIGlmKG1pc3NpbmcoeSkpIHk9MC4wNTtcbiAgaWYoIXkgJWluJSBzZXEoMC4wNSwgMSwgYnkgPSAwLjAyKSkgcHJpbnQoXCJ2YWx1ZSBtdXN0IGJlIDw9IDEpXCIpXG4gIGVsc2UgcmV0dXJuKG5eeSlcbn1cblxuIyBDYWxjdWxhdGlvbiB3aWxsIGJlIGNhcnJpZWQgb3V0XG5NeUZvdXJ0aEZ1bigyLDAuMDcpXG5cbiMgUHJpbnRzIGFuIGVycm9yLCBgeWAgaXMgbm90IGluIHRoZSBhbGxvd2VkIGxpc3QgXG5NeUZvdXJ0aEZ1bigyLDMpXG5cbiMgVXNlIGB5YCBkZWZhdWx0XG5NeUZvdXJ0aEZ1bigyKSAgIFxuXG4jIE5vIGFyZ3VtZW50czogYm90aCBgbmAgYW5kIGB5YCBkZWZhdWx0cyBhcmUgdXNlZFxuTXlGb3VydGhGdW4oKSAgICAgIiwic2N0IjoibXNnIDwtIFwiRG91YmxlIGNoZWNrIGhvdyB5b3UncmUgZGVmaW5pbmcgdGhlIGBNeUZvdXJ0aEZ1bigpYCBmdW5jdGlvbi5cIlxuXG5mdW5fZGVmIDwtIGV4KCkgJT4lIGNoZWNrX2Z1bl9kZWYoXCJNeUZvdXJ0aEZ1blwiKVxuZnVuX2RlZiAlPiUgY2hlY2tfYXJndW1lbnRzKClcbmZ1bl9kZWYgJT4lIGNoZWNrX2NhbGwoMiwwLjA3KSAlPiUgY2hlY2tfcmVzdWx0KG1zZylcbmZ1bl9kZWYgJT4lIGNoZWNrX2NhbGwoMiwzKSAlPiUgY2hlY2tfcmVzdWx0KG1zZylcbmZ1bl9kZWYgJT4lIGNoZWNrX2NhbGwoKSAlPiUgY2hlY2tfcmVzdWx0KG1zZylcbmZ1bl9kZWYgJT4lIGNoZWNrX2JvZHkoKVxuc3VjY2Vzc19tc2coXCJHcmVhdCBqb2IhXCIpIn0=


Note that in the code chunk above, you just added to prints to check the input values passed to it because you won’t be able to see these from your workspace. So, uncomment these to do your checks!

The first call does what expected, the second does not and complains that the exponent is not in the list; the third will use a default exponent, and the fourth will use both defaults.

There are many possible variations on this theme, but you have got the spirit of this!

Anonymous Functions in R

When you don’t give a name to a function, you are creating an anonymous function.

How is this possible?

This is because in R a function (or any object, in fact) is evaluated without the need to assign it or its result to any named variable and can apply to any standard R function.

The syntax is slightly different form the ordinary UDF seen above because now you have a different parentheses approach:

  • First, you use () as usual, to denote a call to a function, immediately after the keyword function: this can specify the argument, in the example x;
  • Secondly, a () couple encircles the function(x) declaration and body;
  • Thirdly, after the previous construct, you specify the argument passed in the call.

It works like this:

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIEFub255bW91cyBmdW5jdGlvbiBzeW50YXhcbihmdW5jdGlvbih4KSB4ICogMTApKDEwKVxuXG4jIGVxdWl2YWxlbnQgKG5vcm1hbCkgd2F5XG5mdW48LWZ1bmN0aW9uKHgpIHggKiAxMFxuXG4jIENhbGwgYGZ1bmAgYW5kIHBhc3MgYDEwYCBhcyBhbiBhcmd1bWVudFxuZnVuKDEwKSIsInNvbHV0aW9uIjoiIyBBbm9ueW1vdXMgZnVuY3Rpb24gc3ludGF4XG4oZnVuY3Rpb24oeCkgeCAqIDEwKSgxMClcblxuIyBlcXVpdmFsZW50IChub3JtYWwpIHdheVxuZnVuPC1mdW5jdGlvbih4KSB4ICogMTBcblxuIyBDYWxsIGBmdW5gIGFuZCBwYXNzIGAxMGAgYXMgYW4gYXJndW1lbnRcbmZ1bigxMCkiLCJzY3QiOiJtc2cgPC0gXCJEb3VibGUgY2hlY2sgaG93IHlvdSdyZSBkZWZpbmluZyB0aGUgYGZ1bigpYCBmdW5jdGlvbi5cIlxuXG5leCgpICU+JSBjaGVja19vdXRwdXRfZXhwcihcIihmdW5jdGlvbih4KSB4ICogMTApKDEwKVwiKVxuZnVuX2RlZiA8LSBleCgpICU+JSBjaGVja19mdW5fZGVmKFwiZnVuXCIpXG5mdW5fZGVmICU+JSBjaGVja19hcmd1bWVudHMoKVxuZnVuX2RlZiAlPiUgY2hlY2tfY2FsbCgxMCkgJT4lIGNoZWNrX3Jlc3VsdChtc2cpXG5mdW5fZGVmICU+JSBjaGVja19ib2R5KClcbnN1Y2Nlc3NfbXNnKFwiR3JlYXQgam9iIVwiKSJ9


Why or when would you use an anonymous function?

As the syntax above indicates, you are doing everything in one shot: the declaration and the call in one line statement. So, despite not transparent when reading it, it is self-contained and you use it because you don’t want to define yet another function somewhere else in your current script (or in an external script): you are dealing with a simple calculation when the need arises and you probably will not use it anywhere else in your code, thus not worth remembering it.

Functions And Functional Programming in R

How could you end this post without mentioning the important facts that R is a functional programming language?

Yes, you read it right, though people usually associate the ‘functional’ attribute to trendy languages like Scala.

Here is a link to authoritative Hadley Wickham’s post on R and his words “you can do anything with functions that you can do with vectors: you can assign them to variables, store them in lists, pass them as arguments to other functions, create them inside functions, and even return them as the result of a function”.

A very interesting bit in this reading is the concept of closures: these are functions written by functions and their main use is in the accessibility of the environment.

As you have seen above, a potential tricky matter is the visibility or not of the variables when a function terminates its job. A closure is made of a function and its environment, and thus the data, and makes it possible to access the caller function environment.

Summary

You have seen that functions constitute the most important programming construct in R, which is in fact a functional language. You can develop functions on our own, which are called “User Defined Functions (UDF)”. The first example introduced us to the notion of functions and variable visibility or “scoping” across environments.

In practice, when you develop your own functions, here are a few hints on how to avoid scoping problems and maintain clean code:

  • Similarly to sourcing functions from libraries, you can to load and execute a function using the source() function;
  • The function environment (variables, other nested functions) is only accessible via the arguments passed to - and the return values obtained;
  • Whenever possible, name functions (or “assign” them to a name). This is similar to naming a variable. Naming functions permits not to use the return statement, although the presence of the latter makes clear where the exit point of the function is located;
  • Anonymous functions can be useful, but if you think you will carry out more than a simple calculation, and you plan to use the function again, just make a new named function; and,
  • In the same spirit, if a function is used repeatedly and has a general usage, perhaps it is worth putting it into a dedicated script (R file) together with its similar sister functions.

And perhaps after playing a bit with this, you decide that it is worth developing your own library of functions!