15 Easy Solutions To Your Data Frame Problems In R

Discover how to create a data frame in R, change column and row names, access values, attach data frames, apply functions and much more.

Jan 10, 2017 · 15 min read

R data frames regularly create somewhat of a furor on public forums like Stack Overflow and Reddit. Starting R users often experience problems with this particular data structure and it doesn’t always seem to be straightforward. But does it really need to be so?

Well, not necessarily.

With today’s post, DataCamp wants to show you that these R data structures don’t need to be hard: we offer you 15 easy, straightforward solutions to the most frequently occuring problems with data.frame. These issues have been selected from the most recent and sticky or upvoted Stack Overflow posts.

(To practice data frames in R, try the data frame chapter of DataCamp's introduction to R course.)

The Root: What’s an R Data Frame Exactly?

With the data frame, R offers you a great first step by allowing you to store your data in overviewable, rectangular grids. Each row of these grids corresponds to measurements or values of an instance, while each column is a vector containing data for a specific variable.

This means that a data frame’s rows do not need to contain, but can contain, the same type of values: they can be numeric, character, logical, etc.;

As you can see below, each instance, listed in the first unnamed column with a number, has certain characteristics that are spread out over the remaining three columns. Each column needs to consist of values of the same type, since they are data vectors: as such, the breaks column only contains numerical values, while the wool and tension columns have characters as values that are stored as factors.

In case you’re wondering, this data is about the number of breaks in yarn during weaving :).

Remember that factors are variables that can only contain a limited number of different values. As such, they are often called categorical variables.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJoZWFkKHdhcnBicmVha3MpIn0=

Maybe you will have already noticed that this data structure ressembles that of matrices, except for the fact that their data values don’t need to be of the same type, while matrices do require this.

Data frames also have similarities with lists, which are basically collections of components. However, it’s a list with vector structures of the same length. As such, they can actually be seen as special types of lists and can be accessed as either a matrix or a list.

If you want more information or if you just want to review and take a look at a comparison of the five general data structures in R, watch the small video below:

R Data Structures (1) from DataCamp.

As you can see, there are different data structures that impose different requirements on how the data is stored. Data frames are particularly handy to store multiple data vectors, which makes it easier to organize your data, to apply functions to it and to save your work.

It’s almost similar to having a single spreadsheet with elements that all have equal lengths!

The Basics: Questions and Solutions

How to Create a Simple Data Frame in R

Even though looking at built-in examples of this data structure, such as esoph, is interesting, it can easily get more exciting!

How?

By practising with your own examples, of course! You can do this very easily by making some vectors first:

Died.At <- c(22,40,72,41)
Writer.At <- c(16, 18, 36, 36)
First.Name <- c("John", "Edgar", "Walt", "Jane")
Second.Name <- c("Doe", "Poe", "Whitman", "Austen")
Sex <- c("MALE", "MALE", "MALE", "FEMALE")
Date.Of.Death <- c("2015-05-10", "1849-10-07", "1892-03-26","1817-07-18")

Next, you just combine the vectors that you made with the data.frame() function:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IkRpZWQuQXQgPC0gYygyMiw0MCw3Miw0MSlcbldyaXRlci5BdCA8LSBjKDE2LCAxOCwgMzYsIDM2KVxuRmlyc3QuTmFtZSA8LSBjKFwiSm9oblwiLCBcIkVkZ2FyXCIsIFwiV2FsdFwiLCBcIkphbmVcIilcblNlY29uZC5OYW1lIDwtIGMoXCJEb2VcIiwgXCJQb2VcIiwgXCJXaGl0bWFuXCIsIFwiQXVzdGVuXCIpXG5TZXggPC0gYyhcIk1BTEVcIiwgXCJNQUxFXCIsIFwiTUFMRVwiLCBcIkZFTUFMRVwiKVxuRGF0ZS5PZi5EZWF0aCA8LSBjKFwiMjAxNS0wNS0xMFwiLCBcIjE4NDktMTAtMDdcIiwgXCIxODkyLTAzLTI2XCIsXCIxODE3LTA3LTE4XCIpIiwic2FtcGxlIjoiIyBNYWtlIHRoZSBkYXRhZnJhbWUgd2l0aCB0aGUgaGVscCBvZiBkYXRhLmZyYW1lKClcbndyaXRlcnNfZGYgPC0gLi4uLi4uLi4uLihEaWVkLkF0LCBXcml0ZXIuQXQsIEZpcnN0Lk5hbWUsIFNlY29uZC5OYW1lLCBTZXgsIERhdGUuT2YuRGVhdGgpXG5cbiMgVXNlIHN0cigpIHRvIGdldCBtb3JlIGluZm8gYWJvdXQgYHdyaXRlcnNfZGZgXG4uLi4od3JpdGVyc19kZikiLCJzb2x1dGlvbiI6IiMgTWFrZSB0aGUgZGF0YWZyYW1lIHdpdGggdGhlIGhlbHAgb2YgZGF0YS5mcmFtZSgpXG53cml0ZXJzX2RmIDwtIGRhdGEuZnJhbWUoRGllZC5BdCwgV3JpdGVyLkF0LCBGaXJzdC5OYW1lLCBTZWNvbmQuTmFtZSwgU2V4LCBEYXRlLk9mLkRlYXRoKVxuXG4jIFVzZSBzdHIoKSB0byBnZXQgbW9yZSBpbmZvIGFib3V0IGB3cml0ZXJzX2RmYFxuc3RyKHdyaXRlcnNfZGYpIiwic2N0IjoidGVzdF9mdW5jdGlvbihcImRhdGEuZnJhbWVcIiwgbm90X2NhbGxlZF9tc2cgPSBcIkRpZCB5b3UgdXNlIGBkYXRhLmZyYW1lKClgIHRvIG1ha2UgeW91ciBkYXRhIGZyYW1lP1wiKVxudGVzdF9vYmplY3QoXCJ3cml0ZXJzX2RmXCIsIGluY29ycmVjdF9tc2c9XCJBcmUgeW91IHN1cmUgdGhhdCB5b3UgaW5jbHVkZWQgYWxsIHZlY3RvcnMgaW4gdGhlIGRhdGEgZnJhbWU/XCIpXG50ZXN0X2Z1bmN0aW9uKFwic3RyXCIsIFwib2JqZWN0XCIsIG5vdF9jYWxsZWRfbXNnID0gXCJEaWQgeW91IHVzZSBgc3RyKClgIHRvIGdldCB0byBrbm93IG1vcmUgYWJvdXQgd3JpdGVyc19kZj9cIiwgaW5jb3JyZWN0X21zZz1cIkRpZCB5b3Ugc3BlY2lmeSBgd3JpdGVyc19kZmAgYXMgYW4gYXJndW1lbnQgb2YgYHN0cigpYD9cIilcbnRlc3RfZXJyb3IoKVxuc3VjY2Vzc19tc2coXCJXZWxsIGRvbmUhIFlvdSBhcmUgd2VsbCBvbiB5b3VyIHdheSB0byBiZWNvbWluZyBhbiBleHBlcnQgaW4gUiBkYXRhIGZyYW1lcyFcIikifQ==

Remember that this type of data structure requires variables of the same length. Check if you have put an equal number of arguments in all c() functions that you assign to the vectors and that you have indicated strings of words with "".

Also, note that when you use the data.frame() function, character variables are imported as factors or categorical variables. Use the str() function to get to know more about writers_df.

However, if you’re more interested in inspecting the first and the last lines of writers_df, you can use the head() and tail() funtions, respectively.

You see that the First.Name, Second.Name, Sex and Date.Of.Death variables of writers_df have all been read in as factors.

But do you really want this?

For the variables First.Name and Second.Name, you don’t want this. You can use the I() function to insulate them. This function inhibits the interpretation of its arguments. In other words, by just slightly changing the definitions of the vectors First.Name and Second.Name with the addition of the I() function, you can make sure that the proper names are not interpreted as factors.
You can keep the Sex vector as a factor, because there are only a limited amount of possible values that this variable can have.
Also for the variable Date.of.Death you don’t want to have a factor. It would be better if the values are registered as dates. You can add the as.Date() function to this variable to make sure this happens.

If you use other functions such as read.table() or other functions that are used to input data, such as read.csv() and read.delim(), you’ll get back a data frame as the result. This way, files that look like this one below or files that have other delimiters, will be converted once they are read into R with these functions.

22, 16, John, Doe, MALE, 2015-05-10
40, 18, Edgar, Poe, MALE, 1849-10-07
72, 36, Walt, Whitman, MALE, 1892-03-26
41, 36, Jane, Austen, FEMALE, 1817-07-18

Check out our tutorial if you want to know more about how you can read and import Excel files into R. Alternatively, you could check out the Rdocumentation page on read.table.

How to Change a Data Frame’s Row and Column Names

Data frames can also have a names attribute, by which you can see the names of the variables that you have included. In other words, you can also set a header.

You already did this before when making writers_df;

You see that the names of the variables Died.At, Writer.At, First.Name, Second.Name, Sex and Date.Of.Death appear:

Now that you see the names of writers_df, you’re not so sure if these are efficient or correct. To change the names that appear, you can easily continue using the names() function.

Make sure, though, that you have a number of arguments in the c() function that is equal to the number of variables that you have included into writers_df.

In this case, since there are six variables Died.At, Writer.At, First.Name, Second.Name, Sex and Death, you want six arguments in the c() function.

Otherwise, the other variables will be interpreted as “NA”.

Note also how the arguments of the c() function are inputted as strings!

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IkRpZWQuQXQgPC0gYygyMiw0MCw3Miw0MSlcbldyaXRlci5BdCA8LSBjKDE2LCAxOCwgMzYsIDM2KVxuRmlyc3QuTmFtZSA8LSBJKGMoXCJKb2huXCIsIFwiRWRnYXJcIiwgXCJXYWx0XCIsIFwiSmFuZVwiKSlcblNlY29uZC5OYW1lIDwtIEkoYyhcIkRvZVwiLCBcIlBvZVwiLCBcIldoaXRtYW5cIiwgXCJBdXN0ZW5cIikpXG5TZXggPC0gYyhcIk1BTEVcIiwgXCJNQUxFXCIsIFwiTUFMRVwiLCBcIkZFTUFMRVwiKVxuRGF0ZS5PZi5EZWF0aCA8LSBhcy5EYXRlKGMoXCIyMDE1LTA1LTEwXCIsIFwiMTg0OS0xMC0wN1wiLCBcIjE4OTItMDMtMjZcIixcIjE4MTctMDctMThcIikpXG53cml0ZXJzX2RmIDwtIGRhdGEuZnJhbWUoRGllZC5BdCwgV3JpdGVyLkF0LCBGaXJzdC5OYW1lLCBTZWNvbmQuTmFtZSwgU2V4LCBEYXRlLk9mLkRlYXRoKSIsInNhbXBsZSI6IiMgQXNzaWduIGRpZmZlcmVudCBuYW1lcyB0byB0aGUgY29sdW1ucyBvZiBgd3JpdGVyc19kZmBcbm5hbWVzKHdyaXRlcnNfZGYpIDwtIC4oXCJBZ2UuQXQuRGVhdGhcIiwgXCJBZ2UuQXMuV3JpdGVyXCIsIFwiTmFtZVwiLCBcIlN1cm5hbWVcIiwgXCJHZW5kZXJcIiwgXCJEZWF0aFwiKVxuXG4jIFNlZSB0aGUgcmVzdWx0XG4uLi4uLih3cml0ZXJzX2RmKSIsInNvbHV0aW9uIjoiIyBBc3NpZ24gZGlmZmVyZW50IG5hbWVzIHRvIHRoZSBjb2x1bW5zIG9mIGB3cml0ZXJzX2RmYFxubmFtZXMod3JpdGVyc19kZikgPC0gYyhcIkFnZS5BdC5EZWF0aFwiLCBcIkFnZS5Bcy5Xcml0ZXJcIiwgXCJOYW1lXCIsIFwiU3VybmFtZVwiLCBcIkdlbmRlclwiLCBcIkRlYXRoXCIpXG5cbiMgU2VlIHRoZSByZXN1bHRcbm5hbWVzKHdyaXRlcnNfZGYpIiwic2N0IjoidGVzdF9mdW5jdGlvbihcImNcIiwgXG4gICAgICAgICAgICAgIFwiLi4uXCIsIFxuICAgICAgICAgICAgICBub3RfY2FsbGVkX21zZz1cIkRpZCB5b3UgdXNlIHRoZSBgYygpYCBmdW5jdGlvbiB0byBhc3NpZ24gZGlmZmVyZW50IGNvbHVtbiBuYW1lcyB0byB5b3VyIGRhdGEgZnJhbWU/XCIsXG4gICAgICAgICAgICAgIGFyZ3Nfbm90X3NwZWNpZmllZCA9IFwiSGF2ZSB5b3UgaW5jbHVkZWQgYWxsIHRoZSBkaWZmZXJlbnQgY29sdW1uIG5hbWVzP1wiLFxuICAgICAgICAgICAgICBpbmNvcnJlY3RfbXNnPVwiQXJlIHlvdSBzdXJlIHlvdSBpbmNsdWRlZCBuZXcgbmFtZXMgZm9yIGFsbCB5b3VyIGRhdGEgZnJhbWUncyBjb2x1bW5zP1wiKVxudGVzdF9mdW5jdGlvbihcIm5hbWVzXCIsIFxuICAgICAgICAgICAgICBcInhcIiwgXG4gICAgICAgICAgICAgIGluZGV4PTEsXG4gICAgICAgICAgICAgIG5vdF9jYWxsZWRfbXNnID0gXCJEaWQgeW91IHVzZSBgbmFtZXMoKWAgdG8gcmV0cmlldmUgdGhlIGNvbHVtbiBuYW1lcz9cIixcbiAgICAgICAgICAgICAgYXJnc19ub3Rfc3BlY2lmaWVkID0gXCJEaWQgeW91IHNwZWNpZnkgYHdyaXRlcnNfZGZgIGFzIGFuIGFyZ3VtZW50IG9mIGBuYW1lcygpYD9cIixcbiAgICAgICAgICAgICAgaW5jb3JyZWN0X21zZz1cIkRpZCB5b3Ugc3BlY2lmeSBgd3JpdGVyc19kZmAgYXMgYW4gYXJndW1lbnQgb2YgYG5hbWVzKClgP1wiKVxudGVzdF9mdW5jdGlvbihcIm5hbWVzXCIsIFxuICAgICAgICAgICAgICBcInhcIiwgXG4gICAgICAgICAgICAgIGluZGV4PTIsXG4gICAgICAgICAgICAgIG5vdF9jYWxsZWRfbXNnID0gXCJEaWQgeW91IHVzZSBgbmFtZXMoKWAgdG8gcmV0cmlldmUgdGhlIGNvbHVtbiBuYW1lcz9cIixcbiAgICAgICAgICAgICAgYXJnc19ub3Rfc3BlY2lmaWVkID0gXCJEaWQgeW91IHNwZWNpZnkgYHdyaXRlcnNfZGZgIGFzIGFuIGFyZ3VtZW50IG9mIGBuYW1lcygpYD9cIixcbiAgICAgICAgICAgICAgaW5jb3JyZWN0X21zZz1cIkRpZCB5b3Ugc3BlY2lmeSBgd3JpdGVyc19kZmAgYXMgYW4gYXJndW1lbnQgb2YgYG5hbWVzKClgP1wiKVxudGVzdF9lcnJvcigpXG5zdWNjZXNzX21zZyhcIkdvb2Qgam9iIVwiKSJ9

Tip: try to leave out the two last arguments from the c() function and see what happens!

Note that you can also access and change the column and row names with the functions colnames() and rownames(), respectively:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IkRpZWQuQXQgPC0gYygyMiw0MCw3Miw0MSlcbldyaXRlci5BdCA8LSBjKDE2LCAxOCwgMzYsIDM2KVxuRmlyc3QuTmFtZSA8LSBJKGMoXCJKb2huXCIsIFwiRWRnYXJcIiwgXCJXYWx0XCIsIFwiSmFuZVwiKSlcblNlY29uZC5OYW1lIDwtIEkoYyhcIkRvZVwiLCBcIlBvZVwiLCBcIldoaXRtYW5cIiwgXCJBdXN0ZW5cIikpXG5TZXggPC0gYyhcIk1BTEVcIiwgXCJNQUxFXCIsIFwiTUFMRVwiLCBcIkZFTUFMRVwiKVxuRGF0ZS5PZi5EZWF0aCA8LSBhcy5EYXRlKGMoXCIyMDE1LTA1LTEwXCIsIFwiMTg0OS0xMC0wN1wiLCBcIjE4OTItMDMtMjZcIixcIjE4MTctMDctMThcIikpXG53cml0ZXJzX2RmIDwtIGRhdGEuZnJhbWUoRGllZC5BdCwgV3JpdGVyLkF0LCBGaXJzdC5OYW1lLCBTZWNvbmQuTmFtZSwgU2V4LCBEYXRlLk9mLkRlYXRoKSIsInNhbXBsZSI6IiMgQXNzaWduIGRpZmZlcmVudCBjb2x1bW4gbmFtZXMgdG8gYHdyaXRlcnNfZGZgXG4uLi4uLi4uLih3cml0ZXJzX2RmKSA8LSBjKFwiQWdlLkF0LkRlYXRoXCIsIFwiQWdlLkFzLldyaXRlclwiLCBcIk5hbWVcIiwgXCJTdXJuYW1lXCIsIFwiR2VuZGVyXCIsIFwiRGVhdGhcIilcblxuIyBBc3NpZ24gcm93IG5hbWVzIHRvIGB3cml0ZXJzX2RmYFxuLi4uLi4uLi4od3JpdGVyc19kZikgPC0gYyhcIklEMVwiLCBcIklEMlwiLCBcIklEM1wiLCBcIklENFwiKVxuXG4jIFJldHVybiBgd3JpdGVyc19kZmBcbi4uLi4uLi4uLi4iLCJzb2x1dGlvbiI6IiMgQXNzaWduIGRpZmZlcmVudCBjb2x1bW4gbmFtZXMgdG8gYHdyaXRlcnNfZGZgXG5jb2xuYW1lcyh3cml0ZXJzX2RmKSA8LSBjKFwiQWdlLkF0LkRlYXRoXCIsIFwiQWdlLkFzLldyaXRlclwiLCBcIk5hbWVcIiwgXCJTdXJuYW1lXCIsIFwiR2VuZGVyXCIsIFwiRGVhdGhcIilcblxuIyBBc3NpZ24gcm93IG5hbWVzIHRvIGB3cml0ZXJzX2RmYFxucm93bmFtZXMod3JpdGVyc19kZikgPC0gYyhcIklEMVwiLCBcIklEMlwiLCBcIklEM1wiLCBcIklENFwiKVxuXG4jIFJldHVybiBgd3JpdGVyc19kZmBcbndyaXRlcnNfZGYiLCJzY3QiOiJ0ZXN0X2Z1bmN0aW9uKFwiY29sbmFtZXNcIiwgXG4gICAgICAgICAgICAgIFwieFwiLCBcbiAgICAgICAgICAgICAgbm90X2NhbGxlZF9tc2c9XCJEaWQgeW91IHVzZSB0aGUgYGNvbG5hbWVzKClgIGZ1bmN0aW9uIHRvIGFzc2lnbiBkaWZmZXJlbnQgY29sdW1uIG5hbWVzIHRvIHlvdXIgZGF0YSBmcmFtZT9cIixcbiAgICAgICAgICAgICAgYXJnc19ub3Rfc3BlY2lmaWVkID0gXCJEaWQgeW91IHBhc3MgYHdyaXRlcnNfZGZgIHRvIGBjb2xuYW1lcygpYD9cIixcbiAgICAgICAgICAgICAgaW5jb3JyZWN0X21zZz1cIkFyZSB5b3Ugc3VyZSB5b3UgaW5jbHVkZWQgYHdyaXRlcnNfZGZgIGFzIGFuIGFyZ3VtZW50IGZvciBgY29sbmFtZXMoKWA/XCIpXG50ZXN0X2Z1bmN0aW9uKFwiY1wiLCBcbiAgICAgICAgICAgICAgXCIuLi5cIiwgXG4gICAgICAgICAgICAgIGluZGV4ID0gMSxcbiAgICAgICAgICAgICAgbm90X2NhbGxlZF9tc2c9XCJEaWQgeW91IHVzZSB0aGUgYGMoKWAgZnVuY3Rpb24gdG8gYXNzaWduIGRpZmZlcmVudCBjb2x1bW4gbmFtZXMgdG8geW91ciBkYXRhIGZyYW1lP1wiLFxuICAgICAgICAgICAgICBhcmdzX25vdF9zcGVjaWZpZWQgPSBcIkhhdmUgeW91IGluY2x1ZGVkIGFsbCB0aGUgZGlmZmVyZW50IGNvbHVtbiBuYW1lcz9cIixcbiAgICAgICAgICAgICAgaW5jb3JyZWN0X21zZz1cIkFyZSB5b3Ugc3VyZSB5b3UgaW5jbHVkZWQgbmV3IG5hbWVzIGZvciBhbGwgeW91ciBkYXRhIGZyYW1lJ3MgY29sdW1ucz9cIilcbnRlc3RfZnVuY3Rpb24oXCJyb3duYW1lc1wiLCBcbiAgICAgICAgICAgICAgXCJ4XCIsIFxuICAgICAgICAgICAgICBub3RfY2FsbGVkX21zZz1cIkRpZCB5b3UgcGFzcyBgd3JpdGVyc19kZmAgdG8gYHJvd25hbWVzKClgP1wiLFxuICAgICAgICAgICAgICBhcmdzX25vdF9zcGVjaWZpZWQgPSBcIkhhdmUgeW91IGluY2x1ZGVkIGB3cml0ZXJzX2RmYCBhcyBhbiBhcmd1bWVudCB0byBgcm93bmFtZXMoKWA/XCIsXG4gICAgICAgICAgICAgIGluY29ycmVjdF9tc2c9XCJBcmUgeW91IHN1cmUgeW91IGluY2x1ZGVkIGB3cml0ZXJzX2RmYCBhcyBhbiBhcmd1bWVudCBmb3IgYHJvd25hbWVzKClgP1wiKVxudGVzdF9mdW5jdGlvbihcImNcIixcbiAgICAgICAgICAgICAgXCIuLi5cIiwgXG4gICAgICAgICAgICAgIGluZGV4ID0gMixcbiAgICAgICAgICAgICAgbm90X2NhbGxlZF9tc2c9XCJEaWQgeW91IHVzZSB0aGUgYGMoKWAgZnVuY3Rpb24gdG8gYXNzaWduIGRpZmZlcmVudCByb3cgbmFtZXMgdG8geW91ciBkYXRhIGZyYW1lP1wiLFxuICAgICAgICAgICAgICBhcmdzX25vdF9zcGVjaWZpZWQgPSBcIkhhdmUgeW91IGluY2x1ZGVkIGFsbCB0aGUgZGlmZmVyZW50IHJvdyBuYW1lcz9cIixcbiAgICAgICAgICAgICAgaW5jb3JyZWN0X21zZz1cIkFyZSB5b3Ugc3VyZSB5b3UgaW5jbHVkZWQgbmV3IG5hbWVzIGZvciBhbGwgeW91ciBkYXRhIGZyYW1lJ3Mgcm93cz9cIilcbnRlc3Rfb3V0cHV0X2NvbnRhaW5zKFwid3JpdGVyc19kZlwiLCBpbmNvcnJlY3RfbXNnPVwiT29wcywgbG9va3MgbGlrZSB5b3UgaGF2ZW4ndCByZXR1cm5lZCB0aGUgYHdyaXRlcnNfZGZgIHlldCFcIilcbnRlc3RfZXJyb3IoKVxuc3VjY2Vzc19tc2coXCJHb29kIGpvYiFcIikifQ==

As you already know, this data structure has similarities to matrices; This means that the size is determined by how many rows and columns you have combined into it.

To check how many rows and columns you have in writers_df, you can use the dim() function:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IkFnZS5BdC5EZWF0aCA8LSBjKDIyLDQwLDcyLDQxKVxuQWdlLkFzLldyaXRlciA8LSBjKDE2LCAxOCwgMzYsIDM2KVxuTmFtZSA8LSBJKGMoXCJKb2huXCIsIFwiRWRnYXJcIiwgXCJXYWx0XCIsIFwiSmFuZVwiKSlcblN1cm5hbWUgPC0gSShjKFwiRG9lXCIsIFwiUG9lXCIsIFwiV2hpdG1hblwiLCBcIkF1c3RlblwiKSlcbkdlbmRlciA8LSBjKFwiTUFMRVwiLCBcIk1BTEVcIiwgXCJNQUxFXCIsIFwiRkVNQUxFXCIpXG5EZWF0aCA8LSBhcy5EYXRlKGMoXCIyMDE1LTA1LTEwXCIsIFwiMTg0OS0xMC0wN1wiLCBcIjE4OTItMDMtMjZcIixcIjE4MTctMDctMThcIikpXG53cml0ZXJzX2RmIDwtIGRhdGEuZnJhbWUoQWdlLkF0LkRlYXRoLCBBZ2UuQXMuV3JpdGVyLCBOYW1lLCBTdXJuYW1lLCBHZW5kZXIsIERlYXRoKSIsInNhbXBsZSI6IiMgUmV0dXJuIHRoZSBudW1iZXIgb2Ygcm93cyBhbmQgY29sdW1uc1xuZGltKHdyaXRlcnNfZGYpXG5cbiMgUmV0cmlldmUgdGhlIG51bWJlciBvZiByb3dzXG5kaW0oLi4uLi4uLi4uLilbMV1cblxuIyBSZXRyaWV2ZSB0aGUgbnVtYmVyIG9mIGNvbHVtbnMgb2YgYHdyaXRlcnNfZGZgXG5kaW0oLi4uLi4uLi4uLilbMl0iLCJzb2x1dGlvbiI6IiMgUmV0dXJuIHRoZSBudW1iZXIgb2Ygcm93cyBhbmQgY29sdW1uc1xuZGltKHdyaXRlcnNfZGYpXG5cbiMgUmV0cmlldmUgdGhlIG51bWJlciBvZiByb3dzXG5kaW0od3JpdGVyc19kZilbMV1cblxuIyBSZXRyaWV2ZSB0aGUgbnVtYmVyIG9mIGNvbHVtbnNcbmRpbSh3cml0ZXJzX2RmKVsyXSIsInNjdCI6InRlc3RfZnVuY3Rpb24oXCJkaW1cIiwgXG4gICAgICAgICAgICAgIGFyZ3M9XCJ4XCIsXG4gICAgICAgICAgICAgIGluZGV4ID0gMSxcbiAgICAgICAgICAgICAgbm90X2NhbGxlZF9tc2cgPSBcIkRpZCB5b3UgdXNlIHRoZSBgZGltKClgIGZ1bmN0aW9uIHRvIHJldHJpZXZlIHRoZSBudW1iZXIgb2Ygcm93cyBhbmQgY29sdW1ucyBvZiB5b3VyIGRhdGEgZnJhbWU/XCIsXG4gICAgICAgICAgICAgIGFyZ3Nfbm90X3NwZWNpZmllZCA9IFwiRGlkIHlvdSBwYXNzIGB3cml0ZXJzX2RmYCB0byBgZGltKClgP1wiLFxuICAgICAgICAgICAgICBpbmNvcnJlY3RfbXNnPVwiQXJlIHlvdSBzdXJlIHlvdSBpbmNsdWRlZCBgd3JpdGVyc19kZmAgYXMgYW4gYXJndW1lbnQgZm9yIGBkaW0oKWA/XCIpXG50ZXN0X2Z1bmN0aW9uKFwiZGltXCIsIFxuICAgICAgICAgICAgICBhcmdzPVwieFwiLFxuICAgICAgICAgICAgICBpbmRleCA9IDIsXG4gICAgICAgICAgICAgIG5vdF9jYWxsZWRfbXNnID0gIFwiRGlkIHlvdSB1c2UgdGhlIGBkaW0oKWAgZnVuY3Rpb24gdG8gcmV0cmlldmUgdGhlIG51bWJlciBvZiByb3dzIG9mIHlvdXIgZGF0YSBmcmFtZT9cIixcbiAgICAgICAgICAgICAgYXJnc19ub3Rfc3BlY2lmaWVkID0gXCJEaWQgeW91IHBhc3MgYHdyaXRlcnNfZGZgIHRvIGBkaW0oKWA/XCIsXG4gICAgICAgICAgICAgIGluY29ycmVjdF9tc2c9XCJBcmUgeW91IHN1cmUgeW91IGluY2x1ZGVkIGB3cml0ZXJzX2RmYCBhcyBhbiBhcmd1bWVudCBmb3IgYGRpbSgpYD9cIilcbnRlc3RfZnVuY3Rpb24oXCJkaW1cIiwgXG4gICAgICAgICAgICAgIGFyZ3M9XCJ4XCIsXG4gICAgICAgICAgICAgIGluZGV4ID0gMyxcbiAgICAgICAgICAgICAgbm90X2NhbGxlZF9tc2cgPSAgXCJEaWQgeW91IHVzZSB0aGUgYGRpbSgpYCBmdW5jdGlvbiB0byByZXRyaWV2ZSB0aGUgbnVtYmVyIG9mIGNvbHVtbnMgb2YgeW91ciBkYXRhIGZyYW1lP1wiLFxuICAgICAgICAgICAgICBhcmdzX25vdF9zcGVjaWZpZWQgPSBcIkRpZCB5b3UgcGFzcyBgd3JpdGVyc19kZmAgdG8gYGRpbSgpYD9cIixcbiAgICAgICAgICAgICAgaW5jb3JyZWN0X21zZyA9IFwiQXJlIHlvdSBzdXJlIHlvdSBpbmNsdWRlZCBgd3JpdGVyc19kZmAgYXMgYW4gYXJndW1lbnQgZm9yIGBkaW0oKWA/XCIpXG50ZXN0X2Vycm9yKClcbnN1Y2Nlc3NfbXNnKFwiQW1hemluZyFcIikifQ==

The result of this function is represented as [1] 4 6. Just like a matrix, the dimensions are defined by the number of rows, followed by the number of columns.

Note that you can also just retrieve the number of rows or columns by adding a [] with an index to your dim() function.

You can also retrieve the number of rows and columns writers_df by using the functions nrow() and ncol(), to retrieve the number of rows or columns, respectively:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IkFnZS5BdC5EZWF0aCA8LSBjKDIyLDQwLDcyLDQxKVxuQWdlLkFzLldyaXRlciA8LSBjKDE2LCAxOCwgMzYsIDM2KVxuTmFtZSA8LSBJKGMoXCJKb2huXCIsIFwiRWRnYXJcIiwgXCJXYWx0XCIsIFwiSmFuZVwiKSlcblN1cm5hbWUgPC0gSShjKFwiRG9lXCIsIFwiUG9lXCIsIFwiV2hpdG1hblwiLCBcIkF1c3RlblwiKSlcbkdlbmRlciA8LSBjKFwiTUFMRVwiLCBcIk1BTEVcIiwgXCJNQUxFXCIsIFwiRkVNQUxFXCIpXG5EZWF0aCA8LSBhcy5EYXRlKGMoXCIyMDE1LTA1LTEwXCIsIFwiMTg0OS0xMC0wN1wiLCBcIjE4OTItMDMtMjZcIixcIjE4MTctMDctMThcIikpXG53cml0ZXJzX2RmIDwtIGRhdGEuZnJhbWUoQWdlLkF0LkRlYXRoLCBBZ2UuQXMuV3JpdGVyLCBOYW1lLCBTdXJuYW1lLCBHZW5kZXIsIERlYXRoKSIsInNhbXBsZSI6IiMgVXNlIGBucm93KClgIHRvIHJldHJpZXZlIHRoZSBudW1iZXIgb2Ygcm93c1xuLi4uLih3cml0ZXJzX2RmKSBcblxuIyBVc2UgYG5jb2woKWAgdG8gcmV0cmlldmUgdGhlIG51bWJlciBvZiByb3dzXG4uLi4uKHdyaXRlcnNfZGYpXG5cbiMgVXNlIGBsZW5ndGgoKWAgdG8gcmV0cmlldmUgdGhlIG51bWJlciBvZiBjb2x1bW5zXG4uLi4uLi4od3JpdGVyc19kZikgIiwic29sdXRpb24iOiIjIFVzZSBgbnJvdygpYCB0byByZXRyaWV2ZSB0aGUgbnVtYmVyIG9mIHJvd3Ncbm5yb3cod3JpdGVyc19kZikgXG5cbiMgVXNlIGBuY29sKClgIHRvIHJldHJpZXZlIHRoZSBudW1iZXIgb2Ygcm93c1xubmNvbCh3cml0ZXJzX2RmKVxuXG4jIFVzZSBgbGVuZ3RoKClgIHRvIHJldHJpZXZlIHRoZSBudW1iZXIgb2YgY29sdW1uc1xubGVuZ3RoKHdyaXRlcnNfZGYpICIsInNjdCI6InRlc3RfZnVuY3Rpb24oXCJucm93XCIsXG4gICAgICAgICAgICAgIFwieFwiLCBcbiAgICAgICAgICAgICAgbm90X2NhbGxlZF9tc2c9XCJEaWQgeW91IHVzZSB0aGUgYG5yb3coKWAgZnVuY3Rpb24gdG8gcmV0cmlldmUgYHdyaXRlcnNfZGZgJ3MgbnVtYmVyIG9mIHJvd3M/XCIsXG4gICAgICAgICAgICAgIGFyZ3Nfbm90X3NwZWNpZmllZCA9IFwiSGF2ZSB5b3UgcGFzc2VkIGB3cml0ZXJzX2RmYCB0byBgbnJvdygpYD9cIixcbiAgICAgICAgICAgICAgaW5jb3JyZWN0X21zZz1cIkFyZSB5b3Ugc3VyZSB5b3UgcGFzc2VkIGB3cml0ZXJzX2RmYCB0byBgbnJvdygpYD9cIilcblxudGVzdF9mdW5jdGlvbihcIm5jb2xcIixcbiAgICAgICAgICAgICAgXCJ4XCIsIFxuICAgICAgICAgICAgICBub3RfY2FsbGVkX21zZz1cIkRpZCB5b3UgdXNlIHRoZSBgbmNvbCgpYCBmdW5jdGlvbiB0byByZXRyaWV2ZSBgd3JpdGVyc19kZmAncyBudW1iZXIgb2YgY29sdW1ucz9cIixcbiAgICAgICAgICAgICAgYXJnc19ub3Rfc3BlY2lmaWVkID0gXCJIYXZlIHlvdSBwYXNzZWQgYHdyaXRlcnNfZGZgIHRvIGBuY29sKClgP1wiLFxuICAgICAgICAgICAgICBpbmNvcnJlY3RfbXNnPVwiQXJlIHlvdSBzdXJlIHlvdSBwYXNzZWQgYHdyaXRlcnNfZGZgIHRvIGBuY29sKClgP1wiKVxuXG50ZXN0X2Z1bmN0aW9uKFwibGVuZ3RoXCIsXG4gICAgICAgICAgICAgIFwieFwiLCBcbiAgICAgICAgICAgICAgbm90X2NhbGxlZF9tc2c9XCJEaWQgeW91IHVzZSB0aGUgYGxlbmd0aCgpYCBmdW5jdGlvbiB0byByZXRyaWV2ZSB0aGUgbnVtYmVyIG9mIGNvbHVtbnMgb2YgYHdyaXRlcnNfZGZgP1wiLFxuICAgICAgICAgICAgICBhcmdzX25vdF9zcGVjaWZpZWQgPSBcIkhhdmUgeW91IHBhc3NlZCBgd3JpdGVyc19kZmAgdG8gYGxlbmd0aCgpYD9cIixcbiAgICAgICAgICAgICAgaW5jb3JyZWN0X21zZz1cIkFyZSB5b3Ugc3VyZSB5b3UgcGFzc2VkIGB3cml0ZXJzX2RmYCB0byBgbGVuZ3RoKClgP1wiKVxudGVzdF9lcnJvcigpXG5zdWNjZXNzX21zZyhcIk5vdyB5b3UgYWxyZWFkeSBrbm93IGEgbG90IG1vcmUgYWJvdXQgeW91ciBkYXRhIGZyYW1lIVwiKSJ9

Note that, since the data structure is also similar to a list, you could also use the length() function to retrieve the number of columns.

How to Access and Change a Data Frame’s Values

There are two main ways in which you can access and change these values. In this section, you’ll see and practice both of them!

….Through the Variable Names

Now that you have retrieved and set the names of writers_df, you want to take a closer look at the values that are actually stored in it.

There are two straightforward ways that you can access these values.

First, you can try to access them by just entering the data frame’s name in combination with the variable name:

Note that if you change one of the values in the vector Age that this change will not be incorporated into writers_df.

In the end, with this method of accessing the values, you just create a copy of a certain variable!

That’s why any changes to the variables do not change the data frame’s variables.

…Through the [,] and $ Notations

You can also access writers_df’s values by using the [,] notation:

Remember that dimensions are defined as rows by columns.

An alternative to the [,] notation is a notation with $, just like this:

writers_df$Age.At.Death

Note also that you can also change the values by simply using these notations to perform mathematical operations.

If you really want to make your hands dirty some more and change some of the values of writers_df, you can use the [,] notation to actually change the values one by one:

Why and how to Attach Data Frames

The $ notation is pretty handy, but it can become very annoying when you have to type it each time that you want to work with your data.

The attach() function offers a solution to this: it takes a data frame as an argument and places it in the search path at position 2.

So unless there are variables in position 1 that are exactly the same as the ones from the data frame that you have inputted, the variables are considered as variables that can be immediately called on.

Note that the search path is in fact the order in which R accesses files. You can look this up by entering the search() function.

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IkFnZS5BdC5EZWF0aCA8LSBjKDIyLDQwLDcyLDQxKVxuQWdlLkFzLldyaXRlciA8LSBjKDE2LCAxOCwgMzYsIDM2KVxuTmFtZSA8LSBJKGMoXCJKb2huXCIsIFwiRWRnYXJcIiwgXCJXYWx0XCIsIFwiSmFuZVwiKSlcblN1cm5hbWUgPC0gSShjKFwiRG9lXCIsIFwiUG9lXCIsIFwiV2hpdG1hblwiLCBcIkF1c3RlblwiKSlcbkdlbmRlciA8LSBjKFwiTUFMRVwiLCBcIk1BTEVcIiwgXCJNQUxFXCIsIFwiRkVNQUxFXCIpXG5EZWF0aCA8LSBhcy5EYXRlKGMoXCIyMDE1LTA1LTEwXCIsIFwiMTg0OS0xMC0wN1wiLCBcIjE4OTItMDMtMjZcIixcIjE4MTctMDctMThcIikpXG53cml0ZXJzX2RmIDwtIGRhdGEuZnJhbWUoQWdlLkF0LkRlYXRoLCBBZ2UuQXMuV3JpdGVyLCBOYW1lLCBTdXJuYW1lLCBHZW5kZXIsIERlYXRoKSIsInNhbXBsZSI6IiMgTG9vayB1cCB0aGUgc2VhcmNoIHBhdGhcbnNlYXJjaCgpXG5cbiMgQXR0YWNoIHRoZSBgd3JpdGVyc19kZmBcbi4uLi4uLih3cml0ZXJzX2RmKVxuXG4jIEFsdGVybmF0aXZlbHksIHVzZSBgd2l0aCgpYCB0byBhdHRhY2ggYHdyaXRlcnNfZGZgXG4uLi4uKHdyaXRlcnNfZGYsIGMoXCJBZ2UuQXQuRGVhdGhcIiwgXCJBZ2UuQXMuV3JpdGVyXCIsIFwiTmFtZVwiLCBcIlN1cm5hbWVcIiwgXCJHZW5kZXJcIiwgXCJEZWF0aFwiKSlcblxuIyBSZXR1cm4gYHdyaXRlcnNfZGZgXG4uLi4uLi4uLi4uIiwic29sdXRpb24iOiIjIExvb2sgdXAgdGhlIHNlYXJjaCBwYXRoXG5zZWFyY2goKVxuXG4jIEF0dGFjaCB0aGUgYHdyaXRlcnNfZGZgXG5hdHRhY2god3JpdGVyc19kZilcblxuIyBBbHRlcm5hdGl2ZWx5LCB1c2UgYHdpdGgoKWAgdG8gYXR0YWNoIGB3cml0ZXJzX2RmYFxud2l0aCh3cml0ZXJzX2RmLCBjKFwiQWdlLkF0LkRlYXRoXCIsIFwiQWdlLkFzLldyaXRlclwiLCBcIk5hbWVcIiwgXCJTdXJuYW1lXCIsIFwiR2VuZGVyXCIsIFwiRGVhdGhcIikpXG5cbiMgUmV0dXJuIGB3cml0ZXJzX2RmYFxud3JpdGVyc19kZiIsInNjdCI6InRlc3RfZnVuY3Rpb24oXCJzZWFyY2hcIixcbiAgICAgICAgICAgICAgbm90X2NhbGxlZF9tc2c9XCJEaWQgeW91IHVzZSB0aGUgYHNlYXJjaCgpYCBmdW5jdGlvbiB0byBsb29rIHVwIHRoZSBzZWFyY2ggcGF0aD9cIixcbiAgICAgICAgICAgICAgYXJnc19ub3Rfc3BlY2lmaWVkID0gXCJIYXZlIHlvdSB1c2VkIGBzZWFyY2goKWAgdG8gbG9vayB1cCB0aGUgc2VhcmNoIHBhdGg/XCIsXG4gICAgICAgICAgICAgIGluY29ycmVjdF9tc2c9XCJBcmUgeW91IHN1cmUgeW91IHVzZWQgYHNlYXJjaCgpYD9cIilcbnRlc3RfZnVuY3Rpb24oXCJhdHRhY2hcIixcbiAgICAgICAgICAgICAgXCJ3aGF0XCIsIFxuICAgICAgICAgICAgICBub3RfY2FsbGVkX21zZz1cIkRpZCB5b3UgdXNlIHRoZSBgYXR0YWNoKClgIGZ1bmN0aW9uIHRvIGF0dGFjaCBgd3JpdGVyc19kZmA/XCIsXG4gICAgICAgICAgICAgIGFyZ3Nfbm90X3NwZWNpZmllZCA9IFwiSGF2ZSB5b3UgcGFzc2VkIGB3cml0ZXJzX2RmYCB0byBgYXR0YWNoKClgP1wiLFxuICAgICAgICAgICAgICBpbmNvcnJlY3RfbXNnPVwiQXJlIHlvdSBzdXJlIHlvdSBwYXNzZWQgYHdyaXRlcnNfZGZgIHRvIGBhdHRhY2goKWA/XCIpXG50ZXN0X2Z1bmN0aW9uKFwid2l0aFwiLFxuICAgICAgICAgICAgICBhcmdzID0gYyhcImRhdGFcIixcImV4cHJcIiksIFxuICAgICAgICAgICAgICBub3RfY2FsbGVkX21zZz1cIkRpZCB5b3UgdXNlIHRoZSBgd2l0aCgpYCBmdW5jdGlvbiB0byBhdHRhY2ggYHdyaXRlcnNfZGZgP1wiLFxuICAgICAgICAgICAgICBhcmdzX25vdF9zcGVjaWZpZWQgPSBcIkhhdmUgeW91IHBhc3NlZCBhbGwgYXJndW1lbnRzIHRvIGB3aXRoKClgP1wiLFxuICAgICAgICAgICAgICBpbmNvcnJlY3RfbXNnPVwiQXJlIHlvdSBzdXJlIHlvdSBwYXNzZWQgYWxsIGFyZ3VtZW50cyB0byBgd2l0aCgpYD9cIilcbnRlc3Rfb3V0cHV0X2NvbnRhaW5zKFwid3JpdGVyc19kZlwiLCBcbiAgICAgICAgICAgICAgICAgICAgIGluY29ycmVjdF9tc2c9XCJPb3BzLCBsb29rcyBsaWtlIHlvdSBoYXZlbid0IHJldHVybmVkIHRoZSBgd3JpdGVyc19kZmAgeWV0IVwiKVxudGVzdF9lcnJvcigpXG5zdWNjZXNzX21zZyhcIkdvb2Qgam9iIVwiKSJ9

Note that you can alternatively use the with() function to attach writers_df, but this requires you to specify some more arguments.

You get an error that tells you that “The following objects are masked by .GlobalEnv:”.

This is because you have objects in your global environment that have the same name as your data frame. Those objects could be the vectors that you created above, if you didn’t change their names.

You have two solutions to this:

You just don’t create any objects with those names in your global environment. This is more a solution for those of you who imported their data through read.table(), read.csv() or read.delim(), but not really appropriate for this case.
You rename the objects in the data frame so that there’s no conflict. This is the solution that was applied in this tutorial. So, rename your columns with the names() or colnames() functions.

Note that if all else fails, you can just remember to always refer to your column names with the $ notation!

Now that you have unmasked the object, you can now safely execute the following command and you can actually access/change the values of all writers_df’s variables:

Age.At.Death <- Age.At.Death-1
Age.At.Death

How to Apply Functions to Data Frames

Now that you have successfully made and modified writers_df by putting a header in place, you can start applying functions to it!

In some cases where you want to calculate stuff, you might want to put the numeric data in a separate data frame:

Only then can you start to get, for example, the mean and the median of your numeric data.

You can do this with the apply() function. The first argument of this function should be your smaller data frame, in this case, Ages. The second argument designates what data you want to consider for the calculations of the mean or median: columns or rows.

In this case, you want to calculate the median and mean of the variables Age.At.Death and Age.As.Writer, which designate columns in writers_df.

The last argument then specifies the exact calculations that you want to do on your data:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IndyaXRlcnNfZGYgPC0gZGF0YS5mcmFtZShjKDIyLDQwLDcyLDQxKSwgYygxNiwgMTgsIDM2LCAzNiksIEkoYyhcIkpvaG5cIiwgXCJFZGdhclwiLCBcIldhbHRcIiwgXCJKYW5lXCIpKSwgSShjKFwiRG9lXCIsIFwiUG9lXCIsIFwiV2hpdG1hblwiLCBcIkF1c3RlblwiKSksIGMoXCJNQUxFXCIsIFwiTUFMRVwiLCBcIk1BTEVcIiwgXCJGRU1BTEVcIiksIGFzLkRhdGUoYyhcIjIwMTUtMDUtMTBcIiwgXCIxODQ5LTEwLTA3XCIsIFwiMTg5Mi0wMy0yNlwiLFwiMTgxNy0wNy0xOFwiKSkpXG5BZ2VzIDwtIHdyaXRlcnNfZGZbLDE6Ml0iLCJzYW1wbGUiOiIjIEFwcGx5IHRoZSBtZWRpYW4gdG8gYEFnZXNgXG5hcHBseShBZ2VzLCAyLCAuLi4uLi4pXG5hcHBseShBZ2VzLCAxLCBtZWRpYW4pXG5cbiMgb3IgeW91IGNhbiBhcHBseSB0aGUgbWVhbiB0byBgQWdlc2BcbmFwcGx5KEFnZXMsIDIsIC4uLi4pIiwic29sdXRpb24iOiIjIEFwcGx5IHRoZSBtZWRpYW4gdG8gYEFnZXNgXG5hcHBseShBZ2VzLCAyLCBtZWRpYW4pXG5hcHBseShBZ2VzLCAxLCBtZWRpYW4pXG5cbiMgb3IgeW91IGNhbiBhcHBseSB0aGUgbWVhbiB0byBgQWdlc2BcbmFwcGx5KEFnZXMsIDIsIG1lYW4pIiwic2N0IjoidGVzdF9mdW5jdGlvbihcImFwcGx5XCIsXG4gICAgICAgICAgICAgIGFyZ3MgPSBjKFwiWFwiLCBcIk1BUkdJTlwiLCBcIkZVTlwiKSxcbiAgICAgICAgICAgICAgMSxcbiAgICAgICAgICAgICAgbm90X2NhbGxlZF9tc2c9XCJEaWQgeW91IHVzZSB0aGUgYGFwcGx5KClgIGZ1bmN0aW9uIHRvIGFwcGx5IHRoZSBtZWRpYW4gdG8gYEFnZXNgP1wiLFxuICAgICAgICAgICAgICBhcmdzX25vdF9zcGVjaWZpZWQgPSBcIkhhdmUgeW91IHBhc3NlZCBgbWVkaWFuYCB0byB0aGUgYGFwcGx5KClgIGZ1bmN0aW9uP1wiLFxuICAgICAgICAgICAgICBpbmNvcnJlY3RfbXNnPVwiQXJlIHlvdSBzdXJlIHlvdSBhZGRlZCBgbWVkaWFuYCB0byB0aGUgYGFwcGx5KClgP1wiKVxudGVzdF9mdW5jdGlvbihcImFwcGx5XCIsXG4gICAgICAgICAgICAgIGFyZ3MgPSBjKFwiWFwiLCBcIk1BUkdJTlwiLCBcIkZVTlwiKSxcbiAgICAgICAgICAgICAgMiwgXG4gICAgICAgICAgICAgIG5vdF9jYWxsZWRfbXNnPVwiRGlkIHlvdSB1c2UgdGhlIGBhcHBseSgpYCBmdW5jdGlvbiB0byBhcHBseSB0aGUgbWVkaWFuIHRvIGBBZ2VzYD9cIixcbiAgICAgICAgICAgICAgYXJnc19ub3Rfc3BlY2lmaWVkID0gXCJIYXZlIHlvdSBwYXNzZWQgYWxsIHRoZSBuZWNlc3NhcnkgYXJndW1lbnRzIHRvIHRoZSBgYXBwbHkoKWAgZnVuY3Rpb24/XCIsXG4gICAgICAgICAgICAgIGluY29ycmVjdF9tc2c9XCJBcmUgeW91IHN1cmUgeW91IHBhc3NlZCBhbGwgYXJndW1ldG5zIHRvIGBhcHBseSgpYD9cIilcbnRlc3RfZnVuY3Rpb24oXCJhcHBseVwiLFxuICAgICAgICAgICAgICBhcmdzID0gYyhcIlhcIiwgXCJNQVJHSU5cIiwgXCJGVU5cIiksXG4gICAgICAgICAgICAgIDMsXG4gICAgICAgICAgICAgIG5vdF9jYWxsZWRfbXNnPVwiRGlkIHlvdSB1c2UgdGhlIGBhcHBseSgpYCBmdW5jdGlvbiB0byBhcHBseSB0aGUgbWVhbiB0byBgQWdlc2A/XCIsXG4gICAgICAgICAgICAgIGFyZ3Nfbm90X3NwZWNpZmllZCA9IFwiSGF2ZSB5b3UgcGFzc2VkIGBtZWFuYCB0byBgYXBwbHkoKWA/XCIsXG4gICAgICAgICAgICAgIGluY29ycmVjdF9tc2c9XCJBcmUgeW91IHN1cmUgeW91IHBhc3NlZCBhbGwgYXJndW1lbnRzIHRvIGBhcHBseSgpYD9cIilcbnN1Y2Nlc3NfbXNnKFwiQW1hemluZyFcIikifQ==

Do you want to know more about the apply() function and how to use it?

DataCamp’s Intermediate R course, which teaches you, amongst other things, how to make your R code more efficient and readable using this function, along with the rest of the apply() family of functions.

Surpassing the Basics: More Questions, More Answers

Now that you have been introduced to the basic pitfalls, it’s time to look at some problems, questions or difficulties that you might have already had while working with these data structures more intensively. If you’re new to this topic, the following section will allow you to step up your data frame game.

All the more reason to get started now!

How to Create an Empty Data Frame

The easiest way to create an empty data frame is probably by just assigning a data.frame() function without any arguments to a vector:

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJhYiA8LSBkYXRhLmZyYW1lKClcbmFiIn0=

You can then start filling your ab up by using the [,] notation.

Be careful, however, because it’s easy to make errors while doing this!

Note how you don’t see any column names in this empty data set. If you do want to have those, you can just initialize empty vectors in ab, like this:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImFiIDwtIGRhdGEuZnJhbWUoKSIsInNhbXBsZSI6IkFnZSA8LSBudW1lcmljKClcbk5hbWUgPC0gY2hhcmFjdGVyKClcbklEIDwtIGludGVnZXIoKVxuR2VuZGVyIDwtIGZhY3RvcigpXG5EYXRlIDwtIGFzLkRhdGUoY2hhcmFjdGVyKCkpXG5hYiA8LSBkYXRhLmZyYW1lKGMoQWdlLCBOYW1lLCBJRCwgR2VuZGVyLCBEYXRlKSlcbmFiIn0=

How to Extract Rows and Columns, Subsetting your Data Frame

Subsetting or extracting specific rows and columns is an important skill in order to surpass the basics that have been introduced in step two, because it allows you to easily manipulate smaller sets of your original data.

You basically extract those values from the rows and columns that you need in order to optimize the data analyses you make.

It’s easy to start subsetting with the [,] notation that was described in step two:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IkFnZS5BdC5EZWF0aCA8LSBjKDIyLDQwLDcyLDQxKVxuQWdlLkFzLldyaXRlciA8LSBjKDE2LCAxOCwgMzYsIDM2KVxuTmFtZSA8LSBJKGMoXCJKb2huXCIsIFwiRWRnYXJcIiwgXCJXYWx0XCIsIFwiSmFuZVwiKSlcblN1cm5hbWUgPC0gSShjKFwiRG9lXCIsIFwiUG9lXCIsIFwiV2hpdG1hblwiLCBcIkF1c3RlblwiKSlcbkdlbmRlciA8LSBjKFwiTUFMRVwiLCBcIk1BTEVcIiwgXCJNQUxFXCIsIFwiRkVNQUxFXCIpXG5EZWF0aCA8LSBhcy5EYXRlKGMoXCIyMDE1LTA1LTEwXCIsIFwiMTg0OS0xMC0wN1wiLCBcIjE4OTItMDMtMjZcIixcIjE4MTctMDctMThcIikpXG53cml0ZXJzX2RmIDwtIGRhdGEuZnJhbWUoQWdlLkF0LkRlYXRoLCBBZ2UuQXMuV3JpdGVyLCBOYW1lLCBTdXJuYW1lLCBHZW5kZXIsIERlYXRoKSIsInNhbXBsZSI6IiMgU3Vic2V0IHRoZSBgd3JpdGVyc19kZmAgZGF0YWZyYW1lXG53cml0ZXJfbmFtZXNfZGYgPC0gd3JpdGVyc19kZlsxOjQsIDM6NF1cblxuIyBSZXR1cm4gdGhlIHN1YnNldHRlZCBgd3JpdGVyc19uYW1lc19kZmBcbi4uLi4uLi4uLi4uLi4uLi4uXG5cbiMgRGVmaW5lIHRoZSBzdWJzZXQgd2l0aCB2YXJpYWJsZSBuYW1lcyBcIk5hbWVcIiBhbmQgXCJTdXJuYW1lXCJcbndyaXRlcl9uYW1lc19kZjIgPC0gd3JpdGVyc19kZlsxOjQsIGMoXCIuLi4uXCIsIFwiLi4uLi4uLi5cIildXG5cbiMgUmV0dXJuIGB3cml0ZXJfbmFtZXNfZGZgXG4uLi4uLi4uLi4uLi4uLi4uLi4iLCJzb2x1dGlvbiI6IiMgU3Vic2V0IHRoZSBgd3JpdGVyc19kZmAgZGF0YWZyYW1lXG53cml0ZXJfbmFtZXNfZGYgPC0gd3JpdGVyc19kZlsxOjQsIDM6NF1cblxuIyBSZXR1cm4gdGhlIHN1YnNldHRlZCBgd3JpdGVyc19uYW1lc19kZmBcbndyaXRlcl9uYW1lc19kZlxuXG4jIERlZmluZSB0aGUgc3Vic2V0IHdpdGggdmFyaWFibGUgbmFtZXMgXCJOYW1lXCIgYW5kIFwiU3VybmFtZVwiXG53cml0ZXJfbmFtZXNfZGYyIDwtIHdyaXRlcnNfZGZbMTo0LCBjKFwiTmFtZVwiLCBcIlN1cm5hbWVcIildXG5cbiMgUmV0dXJuIGB3cml0ZXJfbmFtZXNfZGYyYFxud3JpdGVyX25hbWVzX2RmMiIsInNjdCI6InRlc3Rfb2JqZWN0KFwid3JpdGVyX25hbWVzX2RmXCIsIGluY29ycmVjdF9tc2c9XCJBcmUgeW91IHN1cmUgdGhhdCB5b3Ugc3Vic2V0dGVkIGB3cml0ZXJzX2RmYCBjb3JyZWN0bHk/XCIpXG50ZXN0X291dHB1dF9jb250YWlucyhcIndyaXRlcl9uYW1lc19kZlwiLCBcbiAgICAgICAgICAgICAgICAgICAgIGluY29ycmVjdF9tc2c9XCJPb3BzLCBsb29rcyBsaWtlIHlvdSBoYXZlbid0IHJldHVybmVkIGB3cml0ZXJfbmFtZXNfZGZgIHlldCFcIilcbnRlc3Rfb2JqZWN0KFwid3JpdGVyX25hbWVzX2RmMlwiLCBpbmNvcnJlY3RfbXNnPVwiQXJlIHlvdSBzdXJlIHRoYXQgeW91IHN1YnNldHRlZCBgd3JpdGVyc19kZmAgY29ycmVjdGx5P1wiKVxudGVzdF9vdXRwdXRfY29udGFpbnMoXCJ3cml0ZXJfbmFtZXNfZGYyXCIsIHRpbWVzPTIsXG4gICAgICAgICAgICAgICAgICAgICBpbmNvcnJlY3RfbXNnPVwiT29wcywgbG9va3MgbGlrZSB5b3UgaGF2ZW4ndCByZXR1cm5lZCBgd3JpdGVyX25hbWVzX2RmYCB5ZXQhXCIpXG50ZXN0X2Vycm9yKClcbnN1Y2Nlc3NfbXNnKFwiV2VsbCBkb25lISBLZWVwIG9uIGdvaW5nIHRvIGtub3cgbW9yZSBhYm91dCBzdWJzZXR0aW5nIHlvdXIgZGF0YSBmcmFtZXMuXCIpIn0=

Note that you can also define this subset with the variable names.

Tip: be careful when you are subsetting just one column!

R has the tendency to simplify your results, which means that it will read your subset as a vector, which normally, you don’t want to get.

To make sure that this doesn’t happen, you can add the argument drop=FALSE:

In a next step, you can try subsetting with the subset() function:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IkFnZS5BdC5EZWF0aCA8LSBjKDIyLDQwLDcyLDQxKVxuQWdlLkFzLldyaXRlciA8LSBjKDE2LCAxOCwgMzYsIDM2KVxuTmFtZSA8LSBJKGMoXCJKb2huXCIsIFwiRWRnYXJcIiwgXCJXYWx0XCIsIFwiSmFuZVwiKSlcblN1cm5hbWUgPC0gSShjKFwiRG9lXCIsIFwiUG9lXCIsIFwiV2hpdG1hblwiLCBcIkF1c3RlblwiKSlcbkdlbmRlciA8LSBjKFwiTUFMRVwiLCBcIk1BTEVcIiwgXCJNQUxFXCIsIFwiRkVNQUxFXCIpXG5EZWF0aCA8LSBhcy5EYXRlKGMoXCIyMDE1LTA1LTEwXCIsIFwiMTg0OS0xMC0wN1wiLCBcIjE4OTItMDMtMjZcIixcIjE4MTctMDctMThcIikpXG53cml0ZXJzX2RmIDwtIGRhdGEuZnJhbWUoQWdlLkF0LkRlYXRoLCBBZ2UuQXMuV3JpdGVyLCBOYW1lLCBTdXJuYW1lLCBHZW5kZXIsIERlYXRoKSIsInNhbXBsZSI6IiMgU3Vic2V0IHdpdGggcmFuZ2VzIG9mIHZhbHVlc1xueW91bmdfd3JpdGVyc19kZiA8LSBzdWJzZXQod3JpdGVyc19kZiwgQWdlLkF0LkRlYXRoIDw9IDQwICYgQWdlLkFzLldyaXRlciA+PSAxOClcblxuIyBSZXR1cm4gYHlvdW5nX3dyaXRlcnNfZGZgXG4uLi4uLi4uLi4uLi4uLi4uXG5cbiMgT3Igc3Vic2V0IG9uIHRoZSBuYW1lIHZhbHVlIGBKYW5lYFxuamFuZV93cml0ZXJzX2RmIDwtIHN1YnNldCh3cml0ZXJzX2RmLCBOYW1lID09XCIuLi4uXCIpXG5cbiMgUmV0dXJuIGBqYW5lX3dyaXRlcnNfZGZgXG4uLi4uLi4uLi4uLi4uLi4uXG5cbiMgU3Vic2V0IHdpdGggYGdyZXAoKWBcbmZvcnR5X3dyaXRlcnNfZGYgPC0gd3JpdGVyc19kZltncmVwKFwiNFwiLCB3cml0ZXJzX2RmJEFnZS5BdC5EZWF0aCksXVxuXG4jIFJldHVybiBgZm9ydHlfd3JpdGVyc19kZmBcbi4uLi4uLi4uLi4uLi4iLCJzb2x1dGlvbiI6IiMgU3Vic2V0IHdpdGggcmFuZ2VzIG9mIHZhbHVlc1xueW91bmdfd3JpdGVyc19kZiA8LSBzdWJzZXQod3JpdGVyc19kZiwgQWdlLkF0LkRlYXRoIDw9IDQwICYgQWdlLkFzLldyaXRlciA+PSAxOClcblxuIyBSZXR1cm4gYHlvdW5nX3dyaXRlcnNfZGZgXG55b3VuZ193cml0ZXJzX2RmXG5cbiMgT3Igc3Vic2V0IG9uIGEgcGFydGljdWxhciB2YWx1ZSBgSmFuZWBcbmphbmVfd3JpdGVyc19kZiA8LSBzdWJzZXQod3JpdGVyc19kZiwgTmFtZSA9PVwiSmFuZVwiKVxuXG4jIFJldHVybiBgamFuZV93cml0ZXJzX2RmYFxuamFuZV93cml0ZXJzX2RmXG5cbiMgU3Vic2V0IHdpdGggYGdyZXAoKWBcbmZvcnR5X3dyaXRlcnNfZGYgPC0gd3JpdGVyc19kZltncmVwKFwiNFwiLCB3cml0ZXJzX2RmJEFnZS5BdC5EZWF0aCksXVxuXG4jIFJldHVybiBgZm9ydHlfd3JpdGVyc19kZmBcbmZvcnR5X3dyaXRlcnNfZGYiLCJzY3QiOiJ0ZXN0X29iamVjdCAoXCJ5b3VuZ193cml0ZXJzX2RmXCIsIGluY29ycmVjdF9tc2c9XCJEaWQgeW91IGNvcnJlY3RseSBzdWJzZXQgYHdyaXRlcnNfZGZgP1wiKVxudGVzdF9vdXRwdXRfY29udGFpbnMoXCJ5b3VuZ193cml0ZXJzX2RmXCIsIGluY29ycmVjdF9tc2c9XCJEaWQgeW91IHJldHVybiBgeW91bmdfd3JpdGVyc19kZmA/XCIpXG50ZXN0X29iamVjdChcImphbmVfd3JpdGVyc19kZlwiLCBpbmNvcnJlY3RfbXNnPVwiRGlkIHlvdSBjb3JyZWN0bHkgc3Vic2V0IGB3cml0ZXJzX2RmYD9cIilcbnRlc3Rfb3V0cHV0X2NvbnRhaW5zKFwiamFuZV93cml0ZXJzX2RmXCIsIGluY29ycmVjdF9tc2c9XCJEaWQgeW91IHJldHVybiBgamFuZV93cml0ZXJzX2RmYD9cIilcbnRlc3Rfb2JqZWN0KFwiZm9ydHlfd3JpdGVyc19kZlwiLCBpbmNvcnJlY3RfbXNnPVwiRGlkIHlvdSBjb3JyZWN0bHkgc3Vic2V0IGB3cml0ZXJzX2RmYD9cIilcbnRlc3Rfb3V0cHV0X2NvbnRhaW5zKFwiZm9ydHlfd3JpdGVyc19kZlwiLCBpbmNvcnJlY3RfbXNnPVwiRGlkIHlvdSByZXR1cm4gYGZvcnR5X3dyaXRlcnNfZGZgP1wiKVxudGVzdF9lcnJvcigpXG5zdWNjZXNzX21zZyhcIkdvb2Qgam9iIVwiKSJ9

Note that you can also turn to grep() to subset. In the DataCamp Light chunk above, you used grep() to get the job done. You isolated the rows in the column Age.At.Death that have values that contain “4”.

Note that by subsetting, you basically stop considering certain values. This might mean that you remove certain features of a factor, by, for example, only considering the MALE members of writers_df.

Notice how all factor levels of this column still remain present, even though you have created a subset:

You can use factor() to remove the factor levels that are no longer present, you can enter the following line of code.

How to Remove Columns and Rows from a Data Frame

If you want to remove values or entire columns, you can assign a NULL value to the desired unit:

To remove rows, the procedure is a bit more complicated. You define a new vector in which you list for every row whether to have it included or not.

Then, you apply this vector to writers_df:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IkFnZS5BdC5EZWF0aCA8LSBjKDIyLDQwLDcyLDQxKVxuQWdlLkFzLldyaXRlciA8LSBjKDE2LCAxOCwgMzYsIDM2KVxuTmFtZSA8LSBJKGMoXCJKb2huXCIsIFwiRWRnYXJcIiwgXCJXYWx0XCIsIFwiSmFuZVwiKSlcblN1cm5hbWUgPC0gSShjKFwiRG9lXCIsIFwiUG9lXCIsIFwiV2hpdG1hblwiLCBcIkF1c3RlblwiKSlcbkdlbmRlciA8LSBjKFwiTUFMRVwiLCBcIk1BTEVcIiwgXCJNQUxFXCIsIFwiRkVNQUxFXCIpXG5EZWF0aCA8LSBhcy5EYXRlKGMoXCIyMDE1LTA1LTEwXCIsIFwiMTg0OS0xMC0wN1wiLCBcIjE4OTItMDMtMjZcIixcIjE4MTctMDctMThcIikpXG53cml0ZXJzX2RmIDwtIGRhdGEuZnJhbWUoQWdlLkF0LkRlYXRoLCBBZ2UuQXMuV3JpdGVyLCBOYW1lLCBTdXJuYW1lLCBHZW5kZXIsIERlYXRoKSIsInNhbXBsZSI6IiMgRGVmaW5lIHRoZSByb3dzIHlvdSB3YW50IHRvIGtlZXAgXG5yb3dzX3RvX2tlZXAgPC0gYyhUUlVFLCBGQUxTRSwgVFJVRSwgRkFMU0UpXG5cbiMgU3Vic2V0IGB3cml0ZXJzX2RmYCB3aXRoIHRoZSBgcm93c190b19rZWVwYFxubGltaXRlZF93cml0ZXJzX2RmIDwtIHdyaXRlcnNfZGZbLi4uLi4uLi4uLi4sXVxuXG4jIFJldHVybiBgbGltaXRlZF93cml0ZXJzX2RmYFxubGltaXRlZF93cml0ZXJzX2RmXG5cbiMgU3Vic2V0IGB3cml0ZXJzX2RmYCB3aXRoIHRoZSByb3dzIG5vdCB0byBrZWVwXG5sZXNzX3dyaXRlcnNfZGYgPC0gd3JpdGVyc19kZlshcm93c190b19rZWVwLF1cblxuIyBSZXR1cm4gYGxlc3Nfd3JpdGVyc19kZmBcbi4uLi4uLi4uLi4uLi4uLi5cblxuIyBTdWJzZXQgd2l0aCB0cmVzaG9sZHNcbmZvcnR5X3N0aF93cml0ZXJzIDwtIHdyaXRlcnNfZGZbd3JpdGVyc19kZiRBZ2UuQXQuRGVhdGggPiA0MCxdXG5cbiMgUmV0dXJuIGBmb3J0eV9zdGhfd3JpdGVyc2Bcbi4uLi4uLi4uLi4uLi4uLi4uLiIsInNvbHV0aW9uIjoiIyBEZWZpbmUgdGhlIHJvd3MgeW91IHdhbnQgdG8ga2VlcCBcbnJvd3NfdG9fa2VlcCA8LSBjKFRSVUUsIEZBTFNFLCBUUlVFLCBGQUxTRSlcblxuIyBTdWJzZXQgYHdyaXRlcnNfZGZgIHdpdGggdGhlIHJvd3MgdG8ga2VlcFxubGltaXRlZF93cml0ZXJzX2RmIDwtIHdyaXRlcnNfZGZbcm93c190b19rZWVwLF1cblxuIyBSZXR1cm4gYGxpbWl0ZWRfd3JpdGVyc19kZmBcbmxpbWl0ZWRfd3JpdGVyc19kZlxuXG4jIFN1YnNldCBgd3JpdGVyc19kZmAgd2l0aCB0aGUgcm93cyBub3QgdG8ga2VlcFxubGVzc193cml0ZXJzX2RmIDwtIHdyaXRlcnNfZGZbIXJvd3NfdG9fa2VlcCxdXG5cbiMgUmV0dXJuIGBsZXNzX3dyaXRlcnNfZGZgXG5sZXNzX3dyaXRlcnNfZGZcblxuIyBTdWJzZXQgd2l0aCB0cmVzaG9sZHNcbmZvcnR5X3N0aF93cml0ZXJzIDwtIHdyaXRlcnNfZGZbd3JpdGVyc19kZiRBZ2UuQXQuRGVhdGggPiA0MCxdXG5cbiMgUmV0dXJuIGBmb3J0eV9zdGhfd3JpdGVyc2BcbmZvcnR5X3N0aF93cml0ZXJzIiwic2N0IjoidGVzdF9vYmplY3QoXCJyb3dzX3RvX2tlZXBcIiwgaW5jb3JyZWN0X21zZz1cIkRpZCB5b3UgcGFzcyBlbm91Z2ggdmFsdWVzIHRvIHRoZSBgY2AgZnVuY3Rpb24/XCIsIHVuZGVmaW5lZF9tc2c9XCJEaWQgeW91IHVzZSBgYygpYCB0byBkZWZpbmUgYSBuZXcgcm93IGZvciB5b3VyIGRhdGEgZnJhbWU/XCIpXG50ZXN0X2RhdGFfZnJhbWUoXCJsaW1pdGVkX3dyaXRlcnNfZGZcIiwgaW5jb3JyZWN0X21zZz1cIkRpZCB5b3Ugc3Vic2V0IGB3cml0ZXJzX2RmYCB3aXRoIGhlbHAgb2YgYHJvd3NfdG9fa2VlcGA/XCIsIHVuZGVmaW5lZF9tc2c9XCJEaWQgeW91IHN1YnNldCBgd3JpdGVyc19kZmAgd2l0aCBoZWxwIG9mIGByb3dzX3RvX2tlZXBgP1wiKVxudGVzdF9vdXRwdXRfY29udGFpbnMoXCJsaW1pdGVkX3dyaXRlcnNfZGZcIiwgaW5jb3JyZWN0X21zZz1cIkRpZCB5b3UgcmV0dXJuIGBsaW1pdGVkX3dyaXRlcnNfZGZgP1wiKVxudGVzdF9kYXRhX2ZyYW1lKFwibGVzc193cml0ZXJzX2RmXCIsIGluY29ycmVjdF9tc2c9XCJEaWQgeW91IHN1YnNldCBjb3JyZWN0bHk/XCIsIHVuZGVmaW5lZF9tc2c9XCJEaWQgeW91IHN1YnNldCBgd3JpdGVyc19kZmAgd2l0aCBoZWxwIG9mIGByb3dzX3RvX2tlZXBgP1wiKVxudGVzdF9vdXRwdXRfY29udGFpbnMoXCJsZXNzX3dyaXRlcnNfZGZcIiwgaW5jb3JyZWN0X21zZz1cIkRpZCB5b3UgcmV0dXJuIGBsZXNzX3dyaXRlcnNfZGZgP1wiKVxudGVzdF9kYXRhX2ZyYW1lKFwiZm9ydHlfc3RoX3dyaXRlcnNcIiwgaW5jb3JyZWN0X21zZz1cIkRpZCB5b3Ugc3Vic2V0IGNvcnJlY3RseT9cIilcbnRlc3Rfb3V0cHV0X2NvbnRhaW5zKFwiZm9ydHlfc3RoX3dyaXRlcnNcIiwgaW5jb3JyZWN0X21zZz1cIkRpZCB5b3UgcmV0dXJuIGBmb3J0eV9zdGhfd3JpdGVyc2A/XCIpXG50ZXN0X2Vycm9yKClcbnN1Y2Nlc3NfbXNnKFwiQXdlc29tZSFcIikifQ==

Note that you can also do the opposite by just adding !, stating that the reverse is true. Also note that you can also work with tresholds. In the code chunk above, you specified that you only want to keep all writers that were older than forty when they died.

How to add Rows and Columns to a Data Frame

Much in the same way that you used the [,] and $ notations to access and change single values, you can also easily add columns to writers_df:

Appending rows to an existing data frame is somewhat more complicated.

To easily do this by first making a new row in a vector, respecting the column variables that have been defined in writers_df and by then binding this row to the original data frame with the rbind() funtion:

Why and how to Reshape an R Data Frame from Wide to Long Format and Vice Versa

When you have multiple values, spread out over multiple columns, for the same instance, your data is in the “wide” format.

On the other hand, when your data is in the “long” format if there is one observation row per variable. You therefore have multiple rows per instance.

Let’s illustrate this with an example. Long data looks like this:

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIERlZmluZSByb3dzIFxuU3ViamVjdCA8LSBjKDEsMiwxLDIsMiwxKVxuR2VuZGVyIDwtIGMoXCJNXCIsIFwiRlwiLCBcIk1cIiwgXCJGXCIsIFwiRlwiLFwiTVwiKVxuVGVzdCA8LSBjKFwiUmVhZFwiLCBcIldyaXRlXCIsIFwiV3JpdGVcIiwgXCJMaXN0ZW5cIiwgXCJSZWFkXCIsIFwiTGlzdGVuXCIpXG5SZXN1bHQgPC0gYygxMCwgNCwgOCwgNiwgNywgNylcblxuIyBNYWtlIGBvYnNlcnZhdGlvbnNfbG9uZ2Agd2l0aCBgZGF0YS5mcmFtZSgpYFxub2JzZXJ2YXRpb25zX2xvbmcgPC0gLi4uLi4uLi4uLi4oU3ViamVjdCwgR2VuZGVyLCBUZXN0LCBSZXN1bHQpXG5cbiMgUmV0dXJuIGBvYnNlcnZhdGlvbnNfbG9uZ2Bcbi4uLi4uLi4uLi4uLi4uLi4uLiIsInNvbHV0aW9uIjoiIyBEZWZpbmUgcm93cyBcblN1YmplY3QgPC0gYygxLDIsMSwyLDIsMSlcbkdlbmRlciA8LSBjKFwiTVwiLCBcIkZcIiwgXCJNXCIsIFwiRlwiLCBcIkZcIixcIk1cIilcblRlc3QgPC0gYyhcIlJlYWRcIiwgXCJXcml0ZVwiLCBcIldyaXRlXCIsIFwiTGlzdGVuXCIsIFwiUmVhZFwiLCBcIkxpc3RlblwiKVxuUmVzdWx0IDwtIGMoMTAsIDQsIDgsIDYsIDcsIDcpXG5cbiMgTWFrZSBgb2JzZXJ2YXRpb25zX2xvbmdgIHdpdGggYGRhdGEuZnJhbWUoKWBcbm9ic2VydmF0aW9uc19sb25nIDwtIGRhdGEuZnJhbWUoU3ViamVjdCwgR2VuZGVyLCBUZXN0LCBSZXN1bHQpXG5cbiMgUmV0dXJuIGBvYnNlcnZhdGlvbnNfbG9uZ2Bcbm9ic2VydmF0aW9uc19sb25nIiwic2N0IjoidGVzdF9vYmplY3QoXCJTdWJqZWN0XCIpXG50ZXN0X29iamVjdChcIkdlbmRlclwiKVxudGVzdF9vYmplY3QoXCJUZXN0XCIpXG50ZXN0X29iamVjdChcIlJlc3VsdFwiKVxudGVzdF9vYmplY3QoXCJvYnNlcnZhdGlvbnNfbG9uZ1wiLCBpbmNvcnJlY3RfbXNnPVwiRGlkIHlvdSBtYWtlIGEgZGF0YSBmcmFtZSB3aXRoIGBkYXRhLmZyYW1lKClgP1wiLCB1bmRlZmluZWRfbXNnPVwiRGlkIHlvdSBtYWtlIGEgZGF0YSBmcmFtZSB3aXRoIGBkYXRhLmZyYW1lKClgP1wiKVxudGVzdF9vdXRwdXRfY29udGFpbnMoXCJvYnNlcnZhdGlvbnNfbG9uZ1wiLCBpbmNvcnJlY3RfbXNnPVwiRGlkIHlvdSByZXR1cm4gdGhlIGRhdGEgZnJhbWUgYG9ic2VydmF0aW9uc19sb25nYD9cIilcbnRlc3RfZXJyb3IoKVxuc3VjY2Vzc19tc2coXCJHb29kIGpvYiEgTm93IHN0dWR5IHdoYXQgZGF0YSBpbiBhIGxvbmcgZm9ybWF0IGxvb2tzIGxpa2UuXCIpIn0=

As you can see, there is one row for each value that you have in the Type variable. A lot of statistical tests favor this format.

The data would look like the following in the wide format:

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIERlZmluZSBjb2x1bW5zXG5TdWJqZWN0IDwtIGMoMSwyKVxuR2VuZGVyIDwtIGMoXCJNXCIsIFwiRlwiKVxuUmVhZCA8LSBjKDEwLCA3KVxuV3JpdGUgPC1jKDgsIDQpXG5MaXN0ZW4gPC0gYyg3LCA2KVxuXG4jIE1ha2UgYG9ic2VydmF0aW9uc193aWRlYFxub2JzZXJ2YXRpb25zX3dpZGUgPC0gZGF0YS5mcmFtZShTdWJqZWN0LCBHZW5kZXIsIFJlYWQsIFdyaXRlLCBMaXN0ZW4pXG5cbiMgUmV0dXJuIGBvYnNlcnZhdGlvbnNfd2lkZWBcbm9ic2VydmF0aW9uc193aWRlIn0=

You see that each column represents a unique pairing of the various factors with the values.

Since different functions may require you to input your data either in “long” or “wide” format, you might need to reshape your data set.

There are two main options that you can choose here: you can use the stack() function or you can try using the reshape() function.

The former is preferred when you work with simple data frames, while the latter is more often used on more complex ones, mostly because there’s a difference in the possibilities that both functions offer.

Make sure to keep on reading to know more about the differences in possibilities between the stack() and reshape() functions!

Using `stack()` for Simply Structured Data Frames

The stack() function basically concatenates or combines multiple vectors into a single vector, along with a factor that indicates where each observation originates from.

To go from wide to long format, you will have to stack your observations, since you want one observation row per variable, with multiple rows per variable.

In this case, you want to merge the columns Read, Write and Listen together, qua names and qua values:

To go from long to wide format, you will need to unstack your data, which makes sense because you want to have one row per instance with each value present as a different variable.

Note here that you want to disentangle the Result and Test columns:

Using `reshape()` for Complex Data Frames

This function is part of the stats package. This function is similar to the stack() function, but is a little bit more elaborate. Read and see for yourself how reshaping your data works with the reshape() function:

To go from a wide to a long data format, you can first start off by entering the reshape() function.

The first argumnet should always be your original wide data set.

In this case, you can specify that you want to input the observations_wide to be converted to a long data format.

Then, you start adding other argumnets to the reshape() function:

Include a list of variable names that define the different measurements through varying. In this case, you store the scores of specific tests in the columns “Read”, “Write” and “Listen”.
Next, add the argumentv.names to specify the name that you want to give to the variable that contains these values in your long dataset. In this case, you want to combine all scores for all reading, writing and listening tests into one variable Score.
You also need to give a name to the variable that describes the different measurements that are inputted with the argument timevar. In this case, you want to give a name to the column that contains the types of tests that you give to your students. That’s why this column’s name should be called “Test”.
Then, you add the argument the argumenttimes, because you need to specify that the new column “Test” can only take three values, namely, the test components that you have stored: “Read”, “Write”, “Listen”.
You’re finally there! Give in the end format for the data with the argument direction.
Additionally, you can specify new row names with the argument new.row.names.

Tip: try leaving out this last argument and see what happens!

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IlN1YmplY3QgPC0gYygxLDIsMSwyLDIsMSlcbkdlbmRlciA8LSBjKFwiTVwiLCBcIkZcIiwgXCJNXCIsIFwiRlwiLCBcIkZcIixcIk1cIilcblRlc3QgPC0gYyhcIlJlYWRcIiwgXCJXcml0ZVwiLCBcIldyaXRlXCIsIFwiTGlzdGVuXCIsIFwiUmVhZFwiLCBcIkxpc3RlblwiKVxuUmVzdWx0IDwtIGMoMTAsIDQsIDgsIDYsIDcsIDcpXG5vYnNlcnZhdGlvbnNfbG9uZyA8LSBkYXRhLmZyYW1lKFN1YmplY3QsIEdlbmRlciwgVGVzdCwgUmVzdWx0KSIsInNhbXBsZSI6IiMgSW1wb3J0IGByZXNoYXBlMmBcbmxpYnJhcnkoLi4uLi4uLi4uKVxuXG4jIENvbnZlcnQgdG8gYSB3aWRlIGZvcm1hdCB3aXRoIGBkY2FzdCgpYFxubG9uZ19yZXNoYXBlZDIgPC0gLi4uLi4ob2JzZXJ2YXRpb25zX2xvbmcsIFxuICAgICAgICAgICAgICAgICAgICAgICAgU3ViamVjdCArIEdlbmRlciB+IFRlc3QsIFxuICAgICAgICAgICAgICAgICAgICAgICAgdmFsdWUudmFyPVwiUmVzdWx0XCIpXG5cbiMgUmV0dXJuIGBsb25nX3Jlc2hhcGVkMmBcbi4uLi4uLi4uLi4uLi4uIiwic29sdXRpb24iOiIjIEltcG9ydCBgcmVzaGFwZTJgXG5saWJyYXJ5KHJlc2hhcGUyKVxuXG4jIENvbnZlcnQgdG8gYSB3aWRlIGZvcm1hdCB3aXRoIGBkY2FzdCgpYFxubG9uZ19yZXNoYXBlZDIgPC0gZGNhc3Qob2JzZXJ2YXRpb25zX2xvbmcsIFxuICAgICAgICAgICAgICAgICAgICAgICAgU3ViamVjdCArIEdlbmRlciB+IFRlc3QsIFxuICAgICAgICAgICAgICAgICAgICAgICAgdmFsdWUudmFyPVwiUmVzdWx0XCIpXG5cbiMgUmV0dXJuIGBsb25nX3Jlc2hhcGVkMmBcbmxvbmdfcmVzaGFwZWQyIiwic2N0IjoidGVzdF9saWJyYXJ5X2Z1bmN0aW9uKFwicmVzaGFwZTJcIiwgbm90X2NhbGxlZF9tc2c9XCJEaWQgeW91IGltcG9ydCB0aGUgYHJlc2hhcGUyYCBwYWNrYWdlP1wiKVxudGVzdF9vYmplY3QoXCJsb25nX3Jlc2hhcGVkMlwiLCB1bmRlZmluZWRfbXNnPVwiRG9uJ3QgZm9yZ2V0IHRvIGNvbnZlcnQgdGhlIGRhdGEgdG8gYSB3aWRlIGZvcm1hdCB3aXRoIGBkY2FzdCgpYCFcIiwgaW5jb3JyZWN0X21zZz1cIkRvbid0IGZvcmdldCB0byBhZGQgYGRjYXN0KClgIHRvIHRoZSBjb2RlIHNvIHRoYXQgeW91IGNhbiBjb252ZXJ0IHRoZSBkYXRhIHRvIGEgd2lkZSBmb3JtYXQgc3VjY2Vzc2Z1bGx5IVwiKVxudGVzdF9vdXRwdXRfY29udGFpbnMoXCJsb25nX3Jlc2hhcGVkMlwiLCB0aW1lcz0xLCBpbmNvcnJlY3RfbXNnPVwiRGlkIHlvdSByZXR1cm4gYGxvbmdfcmVzaGFwZWQyYCBjb3JyZWN0bHk/XCIpXG50ZXN0X2Vycm9yKClcbnN1Y2Nlc3NfbXNnKFwiV29vaG9vIVwiKSJ9

From long to wide, you take sort of the same steps. First, you take the reshape() function and give it its first argument, which is the data set that you want to reshape. The other arguments are as follows:

timevar allows you to specify that the variable Test, which describes the different tests that you give to your students, should be decomposed.
You also specify that the reshape() function shouldn’t take into account the variables Subject and Gender of the original data set. You put these column names into idvar.
By not naming the variable Result, the reshape() function will know that both Test and Result should be recombined.
You specify the direction of the reshaping, which is in this case, wide!

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IlN1YmplY3QgPC0gYygxLDIsMSwyLDIsMSlcbkdlbmRlciA8LSBjKFwiTVwiLCBcIkZcIiwgXCJNXCIsIFwiRlwiLCBcIkZcIixcIk1cIilcblRlc3QgPC0gYyhcIlJlYWRcIiwgXCJXcml0ZVwiLCBcIldyaXRlXCIsIFwiTGlzdGVuXCIsIFwiUmVhZFwiLCBcIkxpc3RlblwiKVxuUmVzdWx0IDwtIGMoMTAsIDQsIDgsIDYsIDcsIDcpXG5vYnNlcnZhdGlvbnNfbG9uZyA8LSBkYXRhLmZyYW1lKFN1YmplY3QsIEdlbmRlciwgVGVzdCwgUmVzdWx0KSIsInNhbXBsZSI6IiMgUmVzaGFwZSB0byB3aWRlIGZvcm1hdFxud2lkZV9yZXNoYXBlIDwtIC4uLi4uLi4uKG9ic2VydmF0aW9uc19sb25nLCBcbiAgICAgICAgICAgICAgICAgICAgICAgIHRpbWV2YXIgPSBcIlRlc3RcIixcbiAgICAgICAgICAgICAgICAgICAgICAgIGlkdmFyID0gYyhcIlN1YmplY3RcIiwgXCJHZW5kZXJcIiksXG4gICAgICAgICAgICAgICAgICAgICAgICBkaXJlY3Rpb24gPSBcIndpZGVcIilcblxuIyBSZXR1cm4gYHdpZGVfcmVzaGFwZWBcbi4uLi4uLi4uLi4uIiwic29sdXRpb24iOiIjIFJlc2hhcGUgdG8gd2lkZSBmb3JtYXRcbndpZGVfcmVzaGFwZSA8LSByZXNoYXBlKG9ic2VydmF0aW9uc19sb25nLCBcbiAgICAgICAgICAgICAgICAgICAgICAgIHRpbWV2YXIgPSBcIlRlc3RcIixcbiAgICAgICAgICAgICAgICAgICAgICAgIGlkdmFyID0gYyhcIlN1YmplY3RcIiwgXCJHZW5kZXJcIiksXG4gICAgICAgICAgICAgICAgICAgICAgICBkaXJlY3Rpb24gPSBcIndpZGVcIilcblxuIyBSZXR1cm4gYHdpZGVfcmVzaGFwZWBcbndpZGVfcmVzaGFwZSIsInNjdCI6InRlc3Rfb2JqZWN0KFwid2lkZV9yZXNoYXBlXCIsIGluY29ycmVjdF9tc2c9XCJEb24ndCBmb3JnZXQgdG8gdXNlIGByZXNoYXBlKClgIHRvIGJyaW5nIHRoZSBkYXRhIHRvIGEgd2lkZSBmb3JtYXQ/XCIsIHVuZGVmaW5lZF9tc2c9XCJEaWQgeW91IHVzZSBgcmVzaGFwZSgpYCB0byByZXNoYXBlIHRoZSBkYXRhIHRvIGEgd2lkZSBmb3JtYXQ/XCIpXG50ZXN0X291dHB1dF9jb250YWlucyhcIndpZGVfcmVzaGFwZVwiLCBpbmNvcnJlY3RfbXNnPVwiRGlkIHlvdSByZXR1cm4gYHdpZGVfcmVzaGFwZWAgY29ycmVjdGx5P1wiKVxudGVzdF9lcnJvcigpXG5zdWNjZXNzX21zZyhcIkdyZWF0IGpvYiFcIikifQ==

Note that if you want you can also rename or sort the results of these new long and wide data formats! You can find detailed instructions below.

Reshaping Data Frames with `tidyr`

This package allows you to “easily tidy data with the spread() and gather() functions” and that’s exactly what you’re going to do if you use this package to reshape your data!

If you want to convert from wide to long format, the principle stays similar to the one that of reshape(): you use the gather() function and you start specifying its arguments: 1. Your data set is the first argument to the gather() function 2. Then, you specify the name of the column in which you will combine the the values of Read, Write and Listen. In this case, you want to call it something like Test or Test.Type. 3. You enter the name of the column in which all the values of the Read, Write and Listen columns are listed. 4. You indicate which columns are supposed to be combined into one. In this case, that will be the columns from Read, to Listen.

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IlN1YmplY3QgPC0gYygxLDIpXG5HZW5kZXIgPC0gYyhcIk1cIiwgXCJGXCIpXG5SZWFkIDwtIGMoMTAsIDcpXG5Xcml0ZSA8LWMoOCwgNClcbkxpc3RlbiA8LSBjKDcsIDYpXG5vYnNlcnZhdGlvbnNfd2lkZSA8LSBkYXRhLmZyYW1lKFN1YmplY3QsIEdlbmRlciwgUmVhZCwgV3JpdGUsIExpc3RlbikiLCJzYW1wbGUiOiIjIEltcG9ydCBgdGlkeXJgXG5saWJyYXJ5KC4uLi4uKVxuXG4jIEdhdGhlciB0aGUgd2lkZSBkYXRhIHRvIGEgbG9uZyBmb3JtYXRcbmxvbmdfdGlkeXIgPC0gLi4uLi4uKG9ic2VydmF0aW9uc193aWRlLCBcbiAgICAgICAgICAgICAgICAgICAgIFRlc3QsIFxuICAgICAgICAgICAgICAgICAgICAgUmVzdWx0LCBcbiAgICAgICAgICAgICAgICAgICAgIFJlYWQ6TGlzdGVuKVxuXG4jIFJldHVybiBgbG9uZ190aWR5cmBcbi4uLi4uLi4uLi4iLCJzb2x1dGlvbiI6IiMgSW1wb3J0IGB0aWR5cmBcbmxpYnJhcnkodGlkeXIpXG5cbiMgR2F0aGVyIHRoZSB3aWRlIGRhdGEgdG8gYSBsb25nIGZvcm1hdFxubG9uZ190aWR5ciA8LSBnYXRoZXIob2JzZXJ2YXRpb25zX3dpZGUsIFxuICAgICAgICAgICAgICAgICAgICAgVGVzdCwgXG4gICAgICAgICAgICAgICAgICAgICBSZXN1bHQsIFxuICAgICAgICAgICAgICAgICAgICAgUmVhZDpMaXN0ZW4pXG5cbiMgUmV0dXJuIGBsb25nX3RpZHlyYFxubG9uZ190aWR5ciIsInNjdCI6InRlc3RfZnVuY3Rpb24oXCJsaWJyYXJ5XCIsIGFyZ3M9XCJwYWNrYWdlXCIsIG5vdF9jYWxsZWRfbXNnPVwiRGlkIHlvdSBpbXBvcnQgdGhlIGB0aWR5cmAgcGFja2FnZT9cIiwgaW5jb3JyZWN0X21zZz1cIkRpZCB5b3UgaW1wb3J0IHRoZSBgdGlkeXJgIHBhY2thZ2UgY29ycmVjdGx5P1wiKVxudGVzdF9vYmplY3QoXCJsb25nX3RpZHlyXCIsIGluY29ycmVjdF9tc2c9XCJEaWQgeW91IGFkZCBgZ2F0aGVyKClgP1wiKVxudGVzdF9vdXRwdXRfY29udGFpbnMoXCJsb25nX3RpZHlyXCIsIG1pc3NpbmdfbXNnPVwiRGlkIHlvdSByZXR1cm4gYGxvbmdfdGlkeXJgP1wiLCBpbmNvcnJlY3RfbXNnPVwiRGlkIHlvdSByZXR1cm4gYGxvbmdfdGlkeXJgP1wiKVxudGVzdF9lcnJvcigpXG5zdWNjZXNzX21zZyhcIldlbGwgZG9uZSEgWW91ciBkYXRhIGlzIG5vdyBpbiBhIGxvbmcgZm9ybWF0IVwiKSJ9

Note how this the last argument specifies the columns in the same way as you did to subset writers_df or to select your writers_df’s columns in which you wanted to perform mathematical operations.

You can also just specify the columns individually like this:

long_tidyr <- gather(observations_wide,
                     Test,
                     Result,
                     Read,
                     Write,
                     Listen)
long_tidyr

Tip: try changing the code in the DataCamp Light box above to test this out!

The opposite direction, from long to wide format, is very similar to the function above, but this time with the spread() function:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6ImxpYnJhcnkodGlkeXIpXG5TdWJqZWN0IDwtIGMoMSwyLDEsMiwyLDEpXG5HZW5kZXIgPC0gYyhcIk1cIiwgXCJGXCIsIFwiTVwiLCBcIkZcIiwgXCJGXCIsXCJNXCIpXG5UZXN0IDwtIGMoXCJSZWFkXCIsIFwiV3JpdGVcIiwgXCJXcml0ZVwiLCBcIkxpc3RlblwiLCBcIlJlYWRcIiwgXCJMaXN0ZW5cIilcblJlc3VsdCA8LSBjKDEwLCA0LCA4LCA2LCA3LCA3KVxub2JzZXJ2YXRpb25zX2xvbmcgPC0gZGF0YS5mcmFtZShTdWJqZWN0LCBHZW5kZXIsIFRlc3QsIFJlc3VsdCkiLCJzYW1wbGUiOiIjIFNwcmVhZCB0aGUgbG9uZyBkYXRhIHRvIHdpZGUgZm9ybWF0IHdpdGggYHNwcmVhZCgpYFxud2lkZV90aWR5ciA8LSBzcHJlYWQob2JzZXJ2YXRpb25zX2xvbmcsIFxuICAgICAgICAgICAgICAgICAgICAgVGVzdCwgXG4gICAgICAgICAgICAgICAgICAgICBSZXN1bHQpXG5cbiMgUmV0dXJuIGB3aWRlX3RpZHlyYFxuLi4uLi4uLi4uLiIsInNvbHV0aW9uIjoiIyBTcHJlYWQgdGhlIGxvbmcgZGF0YSB0byB3aWRlIGZvcm1hdCB3aXRoIGBzcHJlYWQoKWBcbndpZGVfdGlkeXIgPC0gc3ByZWFkKG9ic2VydmF0aW9uc19sb25nLCBcbiAgICAgICAgICAgICAgICAgICAgIFRlc3QsIFxuICAgICAgICAgICAgICAgICAgICAgUmVzdWx0KVxuXG4jIFJldHVybiBgd2lkZV90aWR5cmBcbndpZGVfdGlkeXIiLCJzY3QiOiJ0ZXN0X29iamVjdChcIndpZGVfdGlkeXJcIiwgdW5kZWZpbmVkX21zZz1cIkRpZCB5b3UgZm9yZ2V0IHRvIGFkZCBgc3ByZWFkKClgIHRvIHNwcmVhZCB0aGUgbG9uZyBkYXRhIHRvIHdpZGUgZm9ybWF0P1wiLCBpbmNvcnJlY3RfbXNnPVwiRGlkIHlvdSBhZGQgYHNwcmVhZCgpYD9cIilcbnRlc3Rfb3V0cHV0X2NvbnRhaW5zKFwid2lkZV90aWR5clwiLCBpbmNvcnJlY3RfbXNnPVwiRGlkIHlvdSByZXR1cm4gYHdpZGVfdGlkeXJgP1wiKVxudGVzdF9lcnJvcigpXG5zdWNjZXNzX21zZyhcIldlbGwgZG9uZSFcIikifQ==

Again, you take as the first argument your data set. Then, you specify the column that contains the new column names.

In this case, that is Test.

Lastly, you input the name of the column that contains the values that should be put into the new columns.

Reshaping Data Frames with `reshape2`

This package allows you to “flexibly reshape data”. To go from a wide to a long data format, you use its melt() function.

This function is pretty easy, since it just takes your data set and the id.vars argument, which you may already know from the reshape() function. This argument allows you to specify which columns should be left alone by the function.

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IlN1YmplY3QgPC0gYygxLDIpXG5HZW5kZXIgPC0gYyhcIk1cIiwgXCJGXCIpXG5SZWFkIDwtIGMoMTAsIDcpXG5Xcml0ZSA8LWMoOCwgNClcbkxpc3RlbiA8LSBjKDcsIDYpXG5vYnNlcnZhdGlvbnNfd2lkZSA8LSBkYXRhLmZyYW1lKFN1YmplY3QsIEdlbmRlciwgUmVhZCwgV3JpdGUsIExpc3RlbikiLCJzYW1wbGUiOiIjIEltcG9ydCBgcmVzaGFwZTJgXG5saWJyYXJ5KHJlc2hhcGUyKVxuXG4jIE1lbHQgdGhlIHdpZGUgb2JzZXJ2YXRpb25zIHRvIGxvbmcgZm9ybWF0XG5sb25nX3Jlc2hhcGVkMiA8LSAuLi4uKG9ic2VydmF0aW9uc193aWRlLCBcbiAgICAgICAgICAgICAgICAgICAgICAgaWQudmFycz1jKFwiU3ViamVjdFwiLCBcIkdlbmRlclwiKSxcbiAgICAgICAgICAgICAgICAgICAgICAgbWVhc3VyZS52YXJzPWMoXCJSZWFkXCIsIFwiV3JpdGVcIiwgXCJMaXN0ZW5cIiksXG4gICAgICAgICAgICAgICAgICAgICAgIHZhcmlhYmxlLm5hbWU9XCJUZXN0XCIsXG4gICAgICAgICAgICAgICAgICAgICAgIHZhbHVlLm5hbWU9XCJSZXN1bHRcIilcblxuIyBSZXR1cm4gYGxvbmdfcmVzaGFwZWQyYFxuLi4uLi4uLi4uLi4uLiIsInNvbHV0aW9uIjoiIyBJbXBvcnQgYHJlc2hhcGUyYFxubGlicmFyeShyZXNoYXBlMilcblxuIyBNZWx0IHRoZSB3aWRlIG9ic2VydmF0aW9ucyB0byBsb25nIGZvcm1hdFxubG9uZ19yZXNoYXBlZDIgPC0gbWVsdChvYnNlcnZhdGlvbnNfd2lkZSwgXG4gICAgICAgICAgICAgICAgICAgICAgIGlkLnZhcnM9YyhcIlN1YmplY3RcIiwgXCJHZW5kZXJcIiksXG4gICAgICAgICAgICAgICAgICAgICAgIG1lYXN1cmUudmFycz1jKFwiUmVhZFwiLCBcIldyaXRlXCIsIFwiTGlzdGVuXCIpLFxuICAgICAgICAgICAgICAgICAgICAgICB2YXJpYWJsZS5uYW1lPVwiVGVzdFwiLFxuICAgICAgICAgICAgICAgICAgICAgICB2YWx1ZS5uYW1lPVwiUmVzdWx0XCIpXG5cbiMgUmV0dXJuIGBsb25nX3Jlc2hhcGVkMmBcbmxvbmdfcmVzaGFwZWQyIiwic2N0IjoidGVzdF9saWJyYXJ5X2Z1bmN0aW9uKFwicmVzaGFwZTJcIiwgbm90X2NhbGxlZF9tc2c9XCJEaWQgeW91IGltcG9ydCB0aGUgYHJlc2hhcGUyYCBwYWNrYWdlP1wiLCBpbmNvcnJlY3RfbXNnPVwiRGlkIHlvdSBpbXBvcnQgdGhlIGByZXNoYXBlMmAgcGFja2FnZSBjb3JyZWN0bHk/XCIpXG50ZXN0X29iamVjdChcImxvbmdfcmVzaGFwZWQyXCIsIHVuZGVmaW5lZF9tc2c9XCJEb24ndCBmb3JnZXQgdG8gY29udmVydCB0aGUgZGF0YSB0byBhIGxvbmcgZm9ybWF0IHdpdGggYG1lbHQoKWAhXCIsIGluY29ycmVjdF9tc2c9XCJEb24ndCBmb3JnZXQgdG8gYWRkIGBtZWx0KClgIHRvIHRoZSBjb2RlIHNvIHRoYXQgeW91IGNhbiBjb252ZXJ0IHRoZSBkYXRhIHRvIGEgbG9uZyBmb3JtYXQgc3VjY2Vzc2Z1bGx5IVwiKVxudGVzdF9vdXRwdXRfY29udGFpbnMoXCJsb25nX3Jlc2hhcGVkMlwiLCB0aW1lcyA9IDEsIGluY29ycmVjdF9tc2c9XCJEaWQgeW91IHJldHVybiBgbG9uZ19yZXNoYXBlZDJgIGNvcnJlY3RseT9cIilcbnRlc3RfZXJyb3IoKVxuc3VjY2Vzc19tc2coXCJHcmVhdCBqb2IhXCIpIn0=

But, as you will have noted, there are a couple more arguments specified in the code chunk above:

measure.vars is there to name the destination column that will combine the original columns. If you leave out this argument, the melt() function will use all other variables as the id.vars.
variable.name specifies how you want to name that destination column. If you don’t specify this argument, you will have a column named “variable” in your result.
value.name allows you to input the name of the column in which the values or test results will be stored. If you leave out this argument, this column will be named “measurement”.

You can also go from a long to a wide format with the reshape2 package with the dcast() function.

This is fairly easy: you first give in your data set, as always. Then, you combine the columns which you don’t want to be touched;

In this case, you want to keep Subject and Gender as they are. The column Test however, you want to split! So, that is the second part of your second argument, indicated by a ~. The last argument of this function is value.var, which holds the values of the different tests. You want to name this new column Result:

How to Sort a Data Frame

Sorting by columns might seem tricky, but this can be made easy by either using R’s built-in order() function or by using a package.

R’s Built-In `Order()` Function

You can for example sort by one of the dataframe’s columns. You order the rows according to the values that are stored in the variable Age.As.Writer:

writers_df[order(Age.As.Writer),]

If you want to sort the values starting from high to low, you can just add the extra argument decreasing, which can only take logical values.

Remember that logical values are TRUE or FALSE, respectively.

Another way is to add the function rev() so that it includes the order() function. As the function’s name suggests, it provides a way to give you the reversed version of its argument, which is order(Name) in this case:

You can also add a - in front of the numeric variable that you have given to order on.

Note that variables with other data types such as factors require you to convert them to characters or numeric before you can actually sort them:

as.character(Gender)

You can also sort on two variables. In that case, order() needs to have two arguments, so that you first sort by the first argument of the order() function and then on the second argument.

You’ll see an example of this further on in the tutorial.

Sorting with `dplyr`

The dplyr package, known for its abilities to manipulate data, has a specific function that allows you to sort rows by variables.

Dplyr’s function to make this happen is arrange().

The first argument of this function is the data set that you want to sort, while the second and third arguments are the variables that you choose to sort.

In this case you sort first on the variable Age.At.Death and then on Age.As.Writer:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IkFnZS5BdC5EZWF0aCA8LSBjKDIyLDQwLDcyLDQxKVxuQWdlLkFzLldyaXRlciA8LSBjKDE2LCAxOCwgMzYsIDM2KVxuTmFtZSA8LSBJKGMoXCJKb2huXCIsIFwiRWRnYXJcIiwgXCJXYWx0XCIsIFwiSmFuZVwiKSlcblN1cm5hbWUgPC0gSShjKFwiRG9lXCIsIFwiUG9lXCIsIFwiV2hpdG1hblwiLCBcIkF1c3RlblwiKSlcbkdlbmRlciA8LSBjKFwiTUFMRVwiLCBcIk1BTEVcIiwgXCJNQUxFXCIsIFwiRkVNQUxFXCIpXG5EZWF0aCA8LSBhcy5EYXRlKGMoXCIyMDE1LTA1LTEwXCIsIFwiMTg0OS0xMC0wN1wiLCBcIjE4OTItMDMtMjZcIixcIjE4MTctMDctMThcIikpXG53cml0ZXJzX2RmIDwtIGRhdGEuZnJhbWUoQWdlLkF0LkRlYXRoLCBBZ2UuQXMuV3JpdGVyLCBOYW1lLCBTdXJuYW1lLCBHZW5kZXIsIERlYXRoKSIsInNhbXBsZSI6IiMgSW1wb3J0IGBkcGx5cmBcbmxpYnJhcnkoLi4uLi4pXG5cbiMgU29ydCB0aGUgZGF0YSBieSBgQWdlLkF0LkRlYXRoYCBhbmQgYEFnZS5Bcy5Xcml0ZXJgIHdpdGggYGFycmFuZ2UoKWBcbmRhdGEyIDwtIC4uLi4uLi4od3JpdGVyc19kZiwgQWdlLkF0LkRlYXRoLCBBZ2UuQXMuV3JpdGVyKVxuXG4jIFJldHVybiBgZGF0YTJgXG4uLi4uLlxuXG4jIE9yZGVyIGJ5IGBBZ2UuQXQuRGVhdGhgIGFuZCBgQWdlLkFzLldyaXRlcmBcbndyaXRlcnNfZGZbd2l0aCh3cml0ZXJzX2RmLCBvcmRlcihBZ2UuQXQuRGVhdGgsIEFnZS5Bcy5Xcml0ZXIpKSwgXVxuXG4jIFNvcnQgYWNjb3JkaW5nIHRvIGRlc2NlbmRpbmcgYEFnZS5BdC5EZWF0aGAgd2l0aCBgYXJyYW5nZSgpYFxuZGVzY19zb3J0ZWRfZGF0YSA8LSAuLi4uLi4uKHdyaXRlcnNfZGYsIGRlc2MoQWdlLkF0LkRlYXRoKSlcblxuIyBSZXR1cm4gYGRlc2Nfc29ydGVkX2RhdGFgXG4uLi4uLi4uLi4uLi4uLi4uIiwic29sdXRpb24iOiIjIEltcG9ydCBgZHBseXJgXG5saWJyYXJ5KGRwbHlyKVxuXG4jIFNvcnQgdGhlIGRhdGEgYnkgYEFnZS5BdC5EZWF0aGAgYW5kIGBBZ2UuQXMuV3JpdGVyYFxuZGF0YTIgPC0gYXJyYW5nZSh3cml0ZXJzX2RmLCBBZ2UuQXQuRGVhdGgsIEFnZS5Bcy5Xcml0ZXIpXG5cbiMgUmV0dXJuIGBkYXRhMmBcbmRhdGEyXG5cbiMgT3JkZXIgYnkgYEFnZS5BdC5EZWF0aGAgYW5kIGBBZ2UuQXMuV3JpdGVyYFxud3JpdGVyc19kZlt3aXRoKHdyaXRlcnNfZGYsIG9yZGVyKEFnZS5BdC5EZWF0aCwgQWdlLkFzLldyaXRlcikpLCBdXG5cbiMgU29ydCBhY2NvcmRpbmcgdG8gZGVzY2VuZGluZyBgQWdlLkF0LkRlYXRoYFxuZGVzY19zb3J0ZWRfZGF0YSA8LSBhcnJhbmdlKHdyaXRlcnNfZGYsIGRlc2MoQWdlLkF0LkRlYXRoKSlcblxuIyBSZXR1cm4gYGRlc2Nfc29ydGVkX2RhdGFgXG5kZXNjX3NvcnRlZF9kYXRhIiwic2N0IjoidGVzdF9saWJyYXJ5X2Z1bmN0aW9uKFwiZHBseXJcIiwgbm90X2NhbGxlZF9tc2c9XCJEaWQgeW91IGltcG9ydCB0aGUgYGRwbHlyYCBwYWNrYWdlP1wiLCBpbmNvcnJlY3RfbXNnPVwiRGlkIHlvdSBpbXBvcnQgdGhlIGBkcGx5cmAgcGFja2FnZSBjb3JyZWN0bHk/XCIpXG50ZXN0X29iamVjdChcImRhdGEyXCIsIHVuZGVmaW5lZF9tc2c9XCJEb24ndCBmb3JnZXQgdG8gc29ydCB0aGUgZGF0YSB3aXRoIHRoZSBoZWxwIG9mIGBhcnJhbmdlKClgIVwiLCBpbmNvcnJlY3RfbXNnPVwiRG9uJ3QgZm9yZ2V0IHRvIGFkZCBgYXJyYW5nZSgpYCB0byB0aGUgY29kZSBzbyB0aGF0IHlvdSBjYW4gc29ydCB0aGUgZGF0YSBzdWNjZXNzZnVsbHkhXCIpXG50ZXN0X291dHB1dF9jb250YWlucyhcImRhdGEyXCIsIGluY29ycmVjdF9tc2c9XCJEaWQgeW91IHJldHVybiBgZGF0YTJgIGNvcnJlY3RseT9cIilcbnRlc3Rfb3V0cHV0X2NvbnRhaW5zKFwid3JpdGVyc19kZlt3aXRoKHdyaXRlcnNfZGYsIG9yZGVyKEFnZS5BdC5EZWF0aCwgQWdlLkFzLldyaXRlcikpLCBdXCIpXG50ZXN0X29iamVjdChcImRlc2Nfc29ydGVkX2RhdGFcIiwgdW5kZWZpbmVkX21zZz1cIkRvbid0IGZvcmdldCB0byBzb3J0IHRoZSBkYXRhIHdpdGggdGhlIGhlbHAgb2YgYGFycmFuZ2UoKWAhXCIsIGluY29ycmVjdF9tc2c9XCJEb24ndCBmb3JnZXQgdG8gYWRkIGBhcnJhbmdlKClgIHRvIHRoZSBjb2RlIHNvIHRoYXQgeW91IGNhbiBzb3J0IHRoZSBkYXRhIHN1Y2Nlc3NmdWxseSFcIilcbnRlc3Rfb3V0cHV0X2NvbnRhaW5zKFwiZGVzY19zb3J0ZWRfZGF0YVwiLCB0aW1lcz0xLCBpbmNvcnJlY3RfbXNnPVwiRGlkIHlvdSByZXR1cm4gYGRlc2Nfc29ydGVkX2RhdGFgIGNvcnJlY3RseT9cIilcbnRlc3RfZXJyb3IoKVxuc3VjY2Vzc19tc2coXCJHcmVhdCBqb2IhXCIpIn0=

You can also use an approach where you use the with() function to get the same result.

Note that if you want to sort columns in descending order, you can add the function desc() to the variables.

Are you interested in doing much more with the dplyr package? Check out our Data Manipulation in R with dplyr course, which will teach you how to to perform sophisticated data manipulation tasks using dplyr!

Also, check out our data manipulation with dplyr cheat sheet.

Other Options to Sort Data Frames

There are also many other packages that offer sorting functions. This section will only give a short overview of the packages that exist. Firstly, the taRifx package offers sort.data.frame(), by which the values in Age.At.Death can again be sorted in decreasing order:

library(taRifx)
sorted_data <- sort(writers_df, decreasing=TRUE, ~Age.At.Death)
sorted_data

Thirdly, there’s also the package doBy that offers the function orderBy(). In this case, you want to order the values of Age.At.Death from high to low first, and then on the values of the Age.As.Writer variable.

library(doBy)
sorted_data_two <- orderBy(~-Age.At.Death+Age.As.Writer, writers_df)
sorted_data_two

How to Merge Data Frames

Merging Data Frames on Column Names

You can use the merge() function to join two, but only two, data frames.

Let’s say you have data2, which has the same values stored in a variable Age.At.Death, which you also find in writers_df, with exactly the same values. You thus want to merge the two on the basis of this variable:

You can easily merge the above two dataframes.

Tip: check what happens if you change the order of the two arguments of the merge() function!

This way of merging is equivalent to an outer join in SQL.

Unfortunately, you’re not always this lucky. In many cases, some of the columns names or variable values will differ, which makes it hard to follow the easy, standard procedure that was described just now. In addition, you may not always want to merge in the standard way that was described above.

In the following, some of the most common issues are listed and solved!

What If… (some of) the Data Frame’s Column Values are Different?

If (some of) the values of the variable on which you merge differ, you have a small problem, because the merge() function supposes that these are the same so that any new variables that are present in the second data frame can be added correctly to the first.

Consider the following example:

> data2
  x.Age.At.Death Location
1             21        5
2             39        6
3             71        7
4             40        8

You see that the values for the attribute Age.At.Death do not fit with the ones that were defined for writers_df.

No worries, the merge() function provides extra arguments to solve this problem.

The argument all.x allows you to specify to add the extra rows of the Location variable to the resulting data frame, even though this column is not present in writers_df.

In this case, the values of the Location variable will be added to writers_df for those rows of which the values of the Age.At.Death attribute correspond. All rows where the Age.At.Death of the two data frames don’t correspond, will be filled up with NA values.

Note that this join corresponds to a left outer join in SQL and that the default value of the all.x argument is FALSE, which means that one normally only takes into account the corresponding values of the merging variable.

Note also that you can also specify the argument all.y=TRUE if you want to add extra rows for each row that data2 has no matching row in writers_df.

For those who are familiar with SQL, this type of join correponds to a right outer join.

What If… Both Data Frames Have the same Column Names?

What if your two data frames have exactly the same two variables, with or without the same values?

You can chose to keep all values from all corresponding variables and to add rows to the resulting data frame:

Or you can just chose to add values from one specific variable for which the ages of death correspond.

What If… the Data Frames’ Column Names are Different?

Lastly, what if the variable’s names on which you merge differ?

You just specify in the merge() function that there are two other specifications through the arguments by.x and by.y.

Merging Data Frames on Row Names

You can indeed merge the columns of two data frames, that contain a distinct set of columns but some rows with the same names. The merge() function and its arguments come to the rescue!

Consider this second example:

Address <- c("50 West 10th", "77 St. Marks Place", "778 Park Avenue")
Maried <- c("YES", "NO", "YES")
limited_writers_df <- data.frame(Address, Maried)

You see that this data set contains three rows, marked with numbers 1 to 3, and two additional columns that are not in writers_df. To merge these two, you add the argument by to the merge() function and set it at the number 0, which specifies the row names.

Since you choose to keep all values from all corresponding variables and to add columns to the result, you set the all argument to TRUE:

It could be that the fields for rows that don’t occur in both data structures result in NA-values. You can easily solve this by removing them.

How to do this will be discussed below.

How to Remove Data Frame Rows and Columns with NA-Values

To remove all rows that contain NA-values, one of the easiest options is to use the na.omit() function, which takes your data frame as an argument.

Let’s recycle the code from the previous section, where you had a lot of resulting NA-values:

Note that the example above also demonstrates that if you just want to select part of the data frame from which you want to remove the NA-values, it’s better to use complete.cases().

In this case, you’re interested to keep all rows for which the values of the columns Age.As.Writer and Name are complete.

How to Convert Between Data Structures

Convert Lists or Matrices to Data Frames

Lists or matrices that comply with the restrictions that the data frame structure imposes can be coerced into data frames with the as.data.frame() function.

Remember that a data frame is similar to the structure of a matrix, where the columns can be of different types. There are also similarities with lists, where each column is an element of the list and each element has the same length. Any matrices or lists that you want to convert need to satisfy with these restrictions.

For example, the matrix A can be converted because each column contains values of the numeric data type. You enter the matrix A as an argument to the as.data.frame() function:

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJBID0gbWF0cml4KGMoMiwgNCwgMywgMSwgNSwgNyksIG5yb3c9MiwgbmNvbD0zLCBieXJvdyA9IFRSVUUpIFxuQV9kZiA8LSBhcy5kYXRhLmZyYW1lKEEpXG5BIn0=

You can follow the same procedures for lists like the one that is shown below:

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJuID0gYygyLCAzLCA1KSBcbnMgPSBjKFwiYWFcIiwgXCJiYlwiLCBcImNjXCIpXG5iID0gYyhUUlVFLCBGQUxTRSwgVFJVRSlcbnggPSBsaXN0KG4sIHMsIGIsIDMpXG54X2RmIDwtIGFzLmRhdGEuZnJhbWUoeClcbnhfZGYifQ==

Convert Data Frames to Matrices or Lists

To make the opposite move, that is, to convert data frames to matrices and lists, you first have to check for yourself if this is possible. Does your writers_df contain one or more dimensions and what about the amount of data types?

Rewatch the small animation of the introduction if you’re not sure what data structure to pick.

Once you have an answer, you can use the functions as.matrix() and as.list() to convert writers_df to a matrix or a list, respectively:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IndyaXRlcnNfZGYgPC0gZGF0YS5mcmFtZShjKDIyLDQwLDcyLDQxKSwgYygxNiwgMTgsIDM2LCAzNiksIEkoYyhcIkpvaG5cIiwgXCJFZGdhclwiLCBcIldhbHRcIiwgXCJKYW5lXCIpKSwgSShjKFwiRG9lXCIsIFwiUG9lXCIsIFwiV2hpdG1hblwiLCBcIkF1c3RlblwiKSksIGMoXCJNQUxFXCIsIFwiTUFMRVwiLCBcIk1BTEVcIiwgXCJGRU1BTEVcIiksIGFzLkRhdGUoYyhcIjIwMTUtMDUtMTBcIiwgXCIxODQ5LTEwLTA3XCIsIFwiMTg5Mi0wMy0yNlwiLFwiMTgxNy0wNy0xOFwiKSkpIiwic2FtcGxlIjoiIyBNYWtlIGEgbWF0cml4IG91dCBvZiBgd3JpdGVyc19kZmBcbndyaXRlcnNfbWF0cml4IDwtIC4uLi4uLi4uKHdyaXRlcnNfZGYpXG5cbiMgUmV0dXJuIGB3cml0ZXJzX21hdHJpeGBcbi4uLi4uLi4uLi4uLi4uLlxuXG4jIE1ha2UgYSBsaXN0IG91dCBvZiBgd3JpdGVyc19kZmBcbndyaXRlcnNfbGlzdCA8LSAuLi4uLi4uKHdyaXRlcnNfZGYpXG5cbiMgUmV0dXJuIGB3cml0ZXJzX2xpc3RgXG4uLi4uLi4uLi4uLi4iLCJzb2x1dGlvbiI6IiMgTWFrZSBhIG1hdHJpeCBvdXQgb2YgYHdyaXRlcnNfZGZgXG53cml0ZXJzX21hdHJpeCA8LSBhcy5tYXRyaXgod3JpdGVyc19kZilcblxuIyBSZXR1cm4gYHdyaXRlcnNfbWF0cml4YFxud3JpdGVyc19tYXRyaXhcblxuIyBNYWtlIGEgbGlzdCBvdXQgb2YgYHdyaXRlcnNfZGZgXG53cml0ZXJzX2xpc3QgPC0gYXMubGlzdCh3cml0ZXJzX2RmKVxuXG4jIFJldHVybiBgd3JpdGVyc19saXN0YFxud3JpdGVyc19saXN0Iiwic2N0IjoidGVzdF9vYmplY3QoXCJ3cml0ZXJzX21hdHJpeFwiLCB1bmRlZmluZWRfbXNnPVwiRGlkIHlvdSBhZGQgYGFzLm1hdHJpeCgpYCB0byB0aGUgY29kZSBjaHVuayBhYm92ZSB0byBtYWtlIGEgbWF0cml4IG91dCBvZiBgd3JpdGVyc19kZj9gXCIsIGluY29ycmVjdF9tc2c9XCJEb24ndCBmb3JnZXQgdG8gYWRkIGBhcy5tYXRyaXgoKWAgdG8gbWFrZSBhIG1hdHJpeCBvdXQgb2YgYHdyaXRlcnNfZGZgIVwiKVxudGVzdF9vdXRwdXRfY29udGFpbnMoXCJ3cml0ZXJzX21hdHJpeFwiLCB0aW1lcz0xLCBpbmNvcnJlY3RfbXNnPVwiRGlkIHlvdSByZXR1cm4gYHdyaXRlcnNfbWF0cml4YCBjb3JyZWN0bHk/XCIpXG50ZXN0X29iamVjdChcIndyaXRlcnNfbGlzdFwiLCB1bmRlZmluZWRfbXNnPVwiRGlkIHlvdSBhZGQgYGFzLmxpc3QoKWAgdG8gdGhlIGNvZGUgY2h1bmsgYWJvdmUgdG8gbWFrZSBhIGxpc3Qgb3V0IG9mIGB3cml0ZXJzX2RmYD9cIiwgaW5jb3JyZWN0X21zZz1cIkRpZCB5b3UgcmV0dXJuIGB3cml0ZXJzX2xpc3RgIGNvcnJlY3RseT9cIilcbnRlc3Rfb3V0cHV0X2NvbnRhaW5zKFwid3JpdGVyc19saXN0XCIsIHRpbWVzPTEsIGluY29ycmVjdF9tc2c9XCJEaWQgeW91IHJldHVybiBgd3JpdGVyc19saXN0YCBjb3JyZWN0bHk/XCIpXG50ZXN0X2Vycm9yKClcbnN1Y2Nlc3NfbXNnKFwiQW1hemluZyEgVGhhdCB3YXNuJ3QgdG9vIGhhcmQsIHdhcyBpdD9cIikifQ==

For those of you who want to specifically make numeric matrices, you can use the function data.matrix() or add an sapply() function to the as.matrix() function:

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IndyaXRlcnNfZGYgPC0gZGF0YS5mcmFtZShjKDIyLDQwLDcyLDQxKSwgYygxNiwgMTgsIDM2LCAzNiksIEkoYyhcIkpvaG5cIiwgXCJFZGdhclwiLCBcIldhbHRcIiwgXCJKYW5lXCIpKSwgSShjKFwiRG9lXCIsIFwiUG9lXCIsIFwiV2hpdG1hblwiLCBcIkF1c3RlblwiKSksIGMoXCJNQUxFXCIsIFwiTUFMRVwiLCBcIk1BTEVcIiwgXCJGRU1BTEVcIiksIGFzLkRhdGUoYyhcIjIwMTUtMDUtMTBcIiwgXCIxODQ5LTEwLTA3XCIsIFwiMTg5Mi0wMy0yNlwiLFwiMTgxNy0wNy0xOFwiKSkpIiwic2FtcGxlIjoiIyBVc2UgYGRhdGEubWF0cml4KClgIHRvIG1ha2UgYSBudW1lcmljIG1hdHJpeFxud3JpdGVyc19tYXRyaXggPC0gLi4uLi4uLi4uLi4uKHdyaXRlcnNfZGYpXG5cbiMgUmV0dXJuIGB3cml0ZXJzX21hdHJpeGBcbi4uLi4uLi4uLi4uLi4uXG5cbiMgVXNlIGBhcy5tYXRyaXgoKWAgd2l0aCBgc2FwcGx5KClgIHRvIG1ha2UgYSBudW1lcmljIG1hdHJpeFxud3JpdGVyc19udW1lcmljX21hdHJpeCA8LSAuLi4uLi4uLi4oc2FwcGx5KHdyaXRlcnNfZGYsIGFzLm51bWVyaWMpKVxuXG4jIFJldHVybiBgd3JpdGVyc19udW1lcmljX21hdHJpeGBcbi4uLi4uLi4uLi4uLi4uIiwic29sdXRpb24iOiIjIFVzZSBgZGF0YS5tYXRyaXgoKWAgdG8gbWFrZSBhIG51bWVyaWMgbWF0cml4XG53cml0ZXJzX21hdHJpeCA8LSBkYXRhLm1hdHJpeCh3cml0ZXJzX2RmKVxuXG4jIFJldHVybiBgd3JpdGVyc19tYXRyaXhgXG53cml0ZXJzX21hdHJpeFxuXG4jIFVzZSBgYXMubWF0cml4KClgIHdpdGggYHNhcHBseSgpYCB0byBtYWtlIGEgbnVtZXJpYyBtYXRyaXhcbndyaXRlcnNfbnVtZXJpY19tYXRyaXggPC0gYXMubWF0cml4KHNhcHBseSh3cml0ZXJzX2RmLCBhcy5udW1lcmljKSlcblxuIyBSZXR1cm4gYHdyaXRlcnNfbnVtZXJpY19tYXRyaXhgXG53cml0ZXJzX251bWVyaWNfbWF0cml4Iiwic2N0IjoidGVzdF9vYmplY3QoXCJ3cml0ZXJzX21hdHJpeFwiLCB1bmRlZmluZWRfbXNnPVwiRGlkIHlvdSBhZGQgYGRhdGEubWF0cml4KClgIHRvIHRoZSBjb2RlIGNodW5rIGFib3ZlIHRvIG1ha2UgYSBudW1lcmljIG1hdHJpeCBvdXQgb2YgYHdyaXRlcnNfZGY/YFwiLCBpbmNvcnJlY3RfbXNnPVwiRG9uJ3QgZm9yZ2V0IHRvIGFkZCBgZGF0YS5tYXRyaXgoKWAgdG8gbWFrZSBhIG51bWVyaWMgbWF0cml4IG91dCBvZiBgd3JpdGVyc19kZmAhXCIpXG50ZXN0X291dHB1dF9jb250YWlucyhcIndyaXRlcnNfbWF0cml4XCIsIHRpbWVzPTEsIGluY29ycmVjdF9tc2c9XCJEaWQgeW91IHJldHVybiBgd3JpdGVyc19tYXRyaXhgIGNvcnJlY3RseT9cIilcbnRlc3Rfb2JqZWN0KFwid3JpdGVyc19udW1lcmljX21hdHJpeFwiLCB1bmRlZmluZWRfbXNnPVwiRGlkIHlvdSBhZGQgYGFzLm1hdHJpeCgpYCB0byB0aGUgY29kZSBjaHVuayBhYm92ZSB0byBtYWtlIGEgbnVtZXJpYyBtYXRyaXggb3V0IG9mIGB3cml0ZXJzX2RmP2BcIiwgaW5jb3JyZWN0X21zZz1cIkRvbid0IGZvcmdldCB0byBhZGQgYGFzLm1hdHJpeCgpYCB0byBtYWtlIGEgbnVtZXJpYyBtYXRyaXggb3V0IG9mIGB3cml0ZXJzX2RmYCFcIilcbnRlc3Rfb3V0cHV0X2NvbnRhaW5zKFwid3JpdGVyc19udW1lcmljX21hdHJpeFwiLCB0aW1lcz0yLCBpbmNvcnJlY3RfbXNnPVwiRGlkIHlvdSByZXR1cm4gYHdyaXRlcnNfbnVtZXJpY19tYXRyaXhgIGNvcnJlY3RseT9cIilcbnRlc3RfZXJyb3IoKVxuc3VjY2Vzc19tc2coXCJDb25ncmF0cyEgWW91IGhhdmUgc3Vydml2ZWQgdGhlIGRhdGEgZnJhbWUgdHV0b3JpYWwgc3VjY2Vzc2Z1bGx5IVwiKSJ9

Note that with the current writers_df, which contains a mixture of data types, NA-values will be introduced in the resulting matrices.

From Data Structures to Data Analysis, Data Manipulation and Data Visualization

Working with this R data structure is just the beginning of your data analysis!

If this tutorial has gotten you thrilled to dig deeper into programming with R, make sure to check out our free interactive Introduction to R course.

Those of you who are already more advanced with R and that want to take their skills to a higher level might be interested in our courses on data manipulation and data visualization.

Go to our course overview and take a look!

Topics

Data Science

Learn more about R

Course

Introduction to R

4 hr

Master the basics of data analysis in R, including vectors, lists, and data frames, and practice R with real data sets.

See Details

Start Course

Course

Intermediate R

6 hr

670.5K

Continue your journey to becoming an R ninja by learning about conditional statements, loops, and vector functions.

See Details

Start Course

Course

Data Manipulation with data.table in R

4 hr

26.3K

Master core concepts about data manipulation such as filtering, selecting and calculating groupwise statistics using data.table.

See Details

Start Course

Tutorial

Introduction to Data frames in R

This tutorial takes course material from DataCamp's Introduction to R course and allows you to practice data frames.

Ryan Sheehy

Tutorial

Data Frames in R

This tutorial takes course material from DataCamp's Introduction to R for Finance course and allows you to practice Data Frames.

Ryan Sheehy

Tutorial

Sorting Data in R

How to sort a data frame in R.

DataCamp Team

Tutorial

Data Reshaping in R Tutorial

Learn about data reshaping in R, different functions like rbind(), cbind(), along with Melt(), Dcast(), and finally about the transpose function.

Olivia Smith

Tutorial

Matrices in R Tutorial

Learn all about R's matrix, naming rows and columns, accessing elements also with computation like addition, subtraction, multiplication, and division.

Olivia Smith

Tutorial

Mastering Data Structures in the R Programming Language

Read our comprehensive guide on how to work with data structures in R programming: vectors, lists, arrays, matrices, factors, and data frames.

Vikash Singh

See More See More

The Root: What’s an R Data Frame Exactly?

The Basics: Questions and Solutions

How to Create a Simple Data Frame in R

How to Change a Data Frame’s Row and Column Names

How to Access and Change a Data Frame’s Values

….Through the Variable Names

…Through the [,] and $ Notations

Why and how to Attach Data Frames

How to Apply Functions to Data Frames

Surpassing the Basics: More Questions, More Answers

How to Create an Empty Data Frame

How to Extract Rows and Columns, Subsetting your Data Frame

How to Remove Columns and Rows from a Data Frame

How to add Rows and Columns to a Data Frame

Why and how to Reshape an R Data Frame from Wide to Long Format and Vice Versa

Using stack() for Simply Structured Data Frames

Using reshape() for Complex Data Frames

Reshaping Data Frames with tidyr

Reshaping Data Frames with reshape2

How to Sort a Data Frame

R’s Built-In Order() Function

Sorting with dplyr

Other Options to Sort Data Frames

How to Merge Data Frames

Merging Data Frames on Column Names

What If… (some of) the Data Frame’s Column Values are Different?

What If… Both Data Frames Have the same Column Names?

What If… the Data Frames’ Column Names are Different?

Merging Data Frames on Row Names

How to Remove Data Frame Rows and Columns with NA-Values

How to Convert Between Data Structures

Convert Lists or Matrices to Data Frames

Convert Data Frames to Matrices or Lists

From Data Structures to Data Analysis, Data Manipulation and Data Visualization

Introduction to Data frames in R

Data Frames in R

Sorting Data in R

Data Reshaping in R Tutorial

Matrices in R Tutorial

Mastering Data Structures in the R Programming Language

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Introduction to R

Intermediate R

Data Manipulation with data.table in R

Introduction to Data frames in R

Data Frames in R

Sorting Data in R

Data Reshaping in R Tutorial

Matrices in R Tutorial

Mastering Data Structures in the R Programming Language

Using `stack()` for Simply Structured Data Frames

Using `reshape()` for Complex Data Frames

Reshaping Data Frames with `tidyr`

Reshaping Data Frames with `reshape2`

R’s Built-In `Order()` Function

Sorting with `dplyr`

Introduction to R