Tutorials
r programming

Introduction to Data frames in R

This tutorial takes course material from DataCamp's Introduction to R course and allows you to practice data frames.


If you want to take our Introduction to R course, here is the link.

What's a data frame?

You may remember from the chapter about matrices that all the elements that you put in a matrix should be of the same type. Back then, your data set on Star Wars only contained numeric elements.

When doing a market research survey, however, you often have questions such as:

  • 'Are you married?' or 'yes/no' questions (logical)
  • 'How old are you?' (numeric)
  • 'What is your opinion on this product?' or other 'open-ended' questions (character)
  • ...

The output, namely the respondents' answers to the questions formulated above, is a data set of different data types. You will often find yourself working with data sets that contain different data types instead of only one.

A data frame has the variables of a data set as columns and the observations as rows. This will be a familiar concept for those coming from different statistical software packages such as SAS or SPSS.

Instructions

Click 'Submit Answer'. The data from the built-in example data frame mtcars will be printed to the console.

eyJsYW5ndWFnZSI6InIgIiwicHJlX2V4ZXJjaXNlX2NvZGUiOiJcbiIsInNh bXBsZSI6IlxuIyBQcmludCBvdXQgYnVpbHQtaW4gUiBkYXRhIGZyYW1lXG5t dGNhcnMgIiwic29sdXRpb24iOiJcbiMgUHJpbnQgb3V0IGJ1aWx0LWluIFIg ZGF0YSBmcmFtZVxubXRjYXJzICIsInNjdCI6IlxudGVzdF9vdXRwdXRfY29u dGFpbnMoXCJtdGNhcnNcIiwgaW5jb3JyZWN0X21zZyA9IFwiRG8gbm90IGNo YW5nZSBhbnl0aGluZyBhYm91dCB0aGUgY29kZSwgTWFrZSBzdXJlIHRoYXQg eW91IG91dHB1dCBgbXRjYXJzYC5cIilcbnN1Y2Nlc3NfbXNnKFwiR3JlYXQh IENvbnRpbnVlIHRvIHRoZSBuZXh0IGV4ZXJjaXNlLlwiKSIsImhpbnQiOiJc bkp1c3QgY2xpY2sgJ1N1Ym1pdCBBbnN3ZXInIGFuZCB3aXRuZXNzIHRoZSBt YWdpYyEifQ==

Quick, have a look at your data set

Wow, that is a lot of cars!

Working with large data sets is not uncommon in data analysis. When you work with (extremely) large data sets and data frames, your first task as a data analyst is to develop a clear understanding of its structure and main elements. Therefore, it is often useful to show only a small part of the entire data set.

So how to do this in R? Well, the function head() enables you to show the first observations of a data frame. Similarly, the function tail() prints out the last observations in your data set.

Both head() and tail() print a top line called the 'header', which contains the names of the different variables in your data set.

Instructions

Call head() on the mtcars data set to have a look at the header and the first observations.

eyJsYW5ndWFnZSI6InIgIiwicHJlX2V4ZXJjaXNlX2NvZGUiOiJcbiIsInNh bXBsZSI6IlxuIyBDYWxsIGhlYWQoKSBvbiBtdGNhcnNcbiIsInNvbHV0aW9u IjoiXG4jIENhbGwgaGVhZCgpIG9uIG10Y2Fyc1xuaGVhZChtdGNhcnMpIiwi c2N0IjoiXG50ZXN0X2Z1bmN0aW9uKFwiaGVhZFwiLCBcInhcIiwgaW5jb3Jy ZWN0X21zZyA9IFwiSGF2ZSB5b3UgY29ycmVjdGx5IHBhc3NlZCBgbXRjYXJz YCB0byB0aGUgYGhlYWQoKWAgZnVuY3Rpb24/XCIpXG50ZXN0X291dHB1dF9j b250YWlucyhcImhlYWQobXRjYXJzKVwiLCBpbmNvcnJlY3RfbXNnID0gXCJT aW1wbHkgcHJpbnQgb3V0IHRoZSByZXN1bHQgb2YgdGhlIGBoZWFkKClgIGNh bGwsIG5vIG5lZWQgdG8gYXNzaWduIGl0IHRvIGEgbmV3IHZhcmlhYmxlLlwi KVxuc3VjY2Vzc19tc2coXCJXb25kZXJmdWwhIFNvLCB3aGF0IGRvIHdlIGhh dmUgaW4gdGhpcyBkYXRhIHNldD8gRm9yIGV4YW1wbGUsIGBocGAgcmVwcmVz ZW50cyB0aGUgY2FyJ3MgaG9yc2Vwb3dlcjsgdGhlIERhdHN1biBoYXMgdGhl IGxvd2VzdCBob3JzZSBwb3dlciBvZiB0aGUgNiBjYXJzIHRoYXQgYXJlIGRp c3BsYXllZC4gRm9yIGEgZnVsbCBvdmVydmlldyBvZiB0aGUgdmFyaWFibGVz JyBtZWFuaW5nLCB0eXBlIGA/bXRjYXJzYCBpbiB0aGUgY29uc29sZSBhbmQg cmVhZCB0aGUgaGVscCBwYWdlLiBDb250aW51ZSB0byB0aGUgbmV4dCBleGVy Y2lzZSFcIik7IiwiaGludCI6IlxuYGBoZWFkKG10Y2FycylgYCB3aWxsIHNo b3cgdGhlIGZpcnN0IG9ic2VydmF0aW9ucyBvZiB0aGUgYG10Y2Fyc2AgZGF0 YSBmcmFtZS4ifQ==

Have a look at the structure

Another method that is often used to get a rapid overview of your data is the function str(). The function str() shows you the structure of your data set. For a data frame it tells you:

The total number of observations (e.g. 32 car types) The total number of variables (e.g. 11 car features) A full list of the variables names (e.g. mpg, cyl ... ) The data type of each variable (e.g. num) The first observations Applying the str() function will often be the first thing that you do when receiving a new data set or data frame. It is a great way to get more insight in your data set before diving into the real analysis.

Instructions

Investigate the structure of mtcars. Make sure that you see the same numbers, variables and data types as mentioned above.

eyJsYW5ndWFnZSI6InIgIiwicHJlX2V4ZXJjaXNlX2NvZGUiOiJcbiIsInNh bXBsZSI6IlxuIyBJbnZlc3RpZ2F0ZSB0aGUgc3RydWN0dXJlIG9mIG10Y2Fy cyIsInNvbHV0aW9uIjoiXG4jIEludmVzdGlnYXRlIHRoZSBzdHJ1Y3R1cmUg b2YgbXRjYXJzXG5zdHIobXRjYXJzKVxuIiwic2N0IjoiXG50ZXN0X291dHB1 dF9jb250YWlucyhcInN0cihtdGNhcnMpXCIsIGluY29ycmVjdF9tc2cgPSBc IkhhdmUgeW91IGNvcnJlY3RseSBjYWxsZWQgYHN0cigpYCBvbiBgbXRjYXJz YD9cIilcbnN1Y2Nlc3NfbXNnKFwiTmljZSB3b3JrISBDYW4geW91IGZpbmQg YWxsIHRoZSBpbmZvcm1hdGlvbiB0aGF0IGlzIGxpc3RlZCBpbiB0aGUgZXhl cmNpc2UncyBhc3NpZ25tZW50PyBDb250aW51ZSB0byB0aGUgbmV4dCBleGVy Y2lzZS5cIikiLCJoaW50IjoiXG5Vc2UgdGhlIGBgc3RyKClgYCBmdW5jdGlv biBvbiBgbXRjYXJzYC4ifQ==

Creating a data frame

Since using built-in data sets is not even half the fun of creating your own data sets, the rest of this chapter is based on your personally developed data set. Put your jet pack on because it is time for some space exploration!

As a first goal, you want to construct a data frame that describes the main characteristics of eight planets in our solar system. According to your good friend Buzz, the main features of a planet are:

  • The type of planet (Terrestrial or Gas Giant).
  • The planet's diameter relative to the diameter of the Earth.
  • The planet's rotation across the sun relative to that of the Earth.
  • If the planet has rings or not (TRUE or FALSE).

After doing some high-quality research on Wikipedia, you feel confident enough to create the necessary vectors: name, type, diameter, rotation and rings; these vectors have already been coded up on the right. The first element in each of these vectors correspond to the first observation.

You construct a data frame with the data.frame() function. As arguments, you pass the vectors from before: they will become the different columns of your data frame. Because every column has the same length, the vectors you pass should also have the same length. But don't forget that it is possible (and likely) that they contain different types of data.

Instructions

Use the function data.frame() to construct a data frame. Pass the vectors name, type, diameter, rotation and rings as arguments to data.frame(), in this order. Call the resulting data frame planets_df.

eyJsYW5ndWFnZSI6InIgIiwicHJlX2V4ZXJjaXNlX2NvZGUiOiJcbiIsInNh bXBsZSI6IlxuIyBEZWZpbml0aW9uIG9mIHZlY3RvcnNcbm5hbWUgPC0gYyhc Ik1lcmN1cnlcIiwgXCJWZW51c1wiLCBcIkVhcnRoXCIsIFwiTWFyc1wiLCBc Ikp1cGl0ZXJcIiwgXCJTYXR1cm5cIiwgXCJVcmFudXNcIiwgXCJOZXB0dW5l XCIpXG50eXBlIDwtIGMoXCJUZXJyZXN0cmlhbCBwbGFuZXRcIiwgXCJUZXJy ZXN0cmlhbCBwbGFuZXRcIiwgXCJUZXJyZXN0cmlhbCBwbGFuZXRcIiwgXG4g ICAgICAgICAgXCJUZXJyZXN0cmlhbCBwbGFuZXRcIiwgXCJHYXMgZ2lhbnRc IiwgXCJHYXMgZ2lhbnRcIiwgXCJHYXMgZ2lhbnRcIiwgXCJHYXMgZ2lhbnRc IilcbmRpYW1ldGVyIDwtIGMoMC4zODIsIDAuOTQ5LCAxLCAwLjUzMiwgMTEu MjA5LCA5LjQ0OSwgNC4wMDcsIDMuODgzKVxucm90YXRpb24gPC0gYyg1OC42 NCwgLTI0My4wMiwgMSwgMS4wMywgMC40MSwgMC40MywgLTAuNzIsIDAuNjcp XG5yaW5ncyA8LSBjKEZBTFNFLCBGQUxTRSwgRkFMU0UsIEZBTFNFLCBUUlVF LCBUUlVFLCBUUlVFLCBUUlVFKVxuXG4jIENyZWF0ZSBhIGRhdGEgZnJhbWUg ZnJvbSB0aGUgdmVjdG9yc1xucGxhbmV0c19kZiA8LSIsInNvbHV0aW9uIjoi XG4jIERlZmluaXRpb24gb2YgdmVjdG9yc1xubmFtZSA8LSBjKFwiTWVyY3Vy eVwiLCBcIlZlbnVzXCIsIFwiRWFydGhcIiwgXCJNYXJzXCIsIFwiSnVwaXRl clwiLCBcIlNhdHVyblwiLCBcIlVyYW51c1wiLCBcIk5lcHR1bmVcIilcbnR5 cGUgPC0gYyhcIlRlcnJlc3RyaWFsIHBsYW5ldFwiLCBcIlRlcnJlc3RyaWFs IHBsYW5ldFwiLCBcIlRlcnJlc3RyaWFsIHBsYW5ldFwiLCBcbiAgICAgICAg ICBcIlRlcnJlc3RyaWFsIHBsYW5ldFwiLCBcIkdhcyBnaWFudFwiLCBcIkdh cyBnaWFudFwiLCBcIkdhcyBnaWFudFwiLCBcIkdhcyBnaWFudFwiKVxuZGlh bWV0ZXIgPC0gYygwLjM4MiwgMC45NDksIDEsIDAuNTMyLCAxMS4yMDksIDku NDQ5LCA0LjAwNywgMy44ODMpXG5yb3RhdGlvbiA8LSBjKDU4LjY0LCAtMjQz LjAyLCAxLCAxLjAzLCAwLjQxLCAwLjQzLCAtMC43MiwgMC42NylcbnJpbmdz IDwtIGMoRkFMU0UsIEZBTFNFLCBGQUxTRSwgRkFMU0UsIFRSVUUsIFRSVUUs IFRSVUUsIFRSVUUpXG5cbiMgQ3JlYXRlIGEgZGF0YSBmcmFtZSBmcm9tIHRo ZSB2ZWN0b3JzXG5wbGFuZXRzX2RmIDwtIGRhdGEuZnJhbWUobmFtZSwgdHlw ZSwgZGlhbWV0ZXIsIHJvdGF0aW9uLCByaW5ncykiLCJzY3QiOiJcbm1zZyA9 IFwiRG8gbm90IGNoYW5nZSBhbnl0aGluZyBhYm91dCB0aGUgZGVmaW5pdGlv biBvZiB0aGUgdmVjdG9ycy4gT25seSBhZGQgYSBgZGF0YS5mcmFtZSgpYCBj YWxsIHRvIGNyZWF0ZSBgcGxhbmV0c19kZmAuXCJcbnRlc3Rfb2JqZWN0KFwi bmFtZVwiLCB1bmRlZmluZWRfbXNnID0gbXNnLCBpbmNvcnJlY3RfbXNnID0g bXNnKVxudGVzdF9vYmplY3QoXCJ0eXBlXCIsIHVuZGVmaW5lZF9tc2cgPSBt c2csIGluY29ycmVjdF9tc2cgPSBtc2cpXG50ZXN0X29iamVjdChcImRpYW1l dGVyXCIsIHVuZGVmaW5lZF9tc2cgPSBtc2csIGluY29ycmVjdF9tc2cgPSBt c2cpXG50ZXN0X29iamVjdChcInJvdGF0aW9uXCIsIHVuZGVmaW5lZF9tc2cg PSBtc2csIGluY29ycmVjdF9tc2cgPSBtc2cpXG50ZXN0X29iamVjdChcInJp bmdzXCIsIHVuZGVmaW5lZF9tc2cgPSBtc2csIGluY29ycmVjdF9tc2cgPSBt c2cpXG5cbnRlc3Rfb2JqZWN0KFwicGxhbmV0c19kZlwiLFxuICAgICAgICAg ICAgaW5jb3JyZWN0X21zZyA9IFwiSGF2ZSB5b3UgY29ycmVjdGx5IGNhbGxl ZCBgZGF0YS5mcmFtZSgpYCB0byBjcmVhdGUgYHBsYW5ldHNfZGZgLiBJbnNp ZGUgYGRhdGEuZnJhbWUoKWAsIG1ha2Ugc3VyZSB0byBwYXNzIGFsbCB2ZWN0 b3JzIGluIHRoZSBjb3JyZWN0IG9yZGVyOiBgbmFtZWAsIGB0eXBlYCwgYGRp YW1ldGVyYCwgYHJvdGF0aW9uYCBhbmQgZmluYWxseSBgcmluZ3NgLlwiKVxu XG5zdWNjZXNzX21zZyhcIkdyZWF0IGpvYiEgQ29udGludWUgdG8gdGhlIG5l eHQgZXhlcmNpc2UuIFRoZSBsb2dpY2FsIG5leHQgc3RlcCwgYXMgeW91IGtu b3cgYnkgbm93LCBpcyBpbnNwZWN0aW5nIHRoZSBkYXRhIGZyYW1lIHlvdSBq dXN0IGNyZWF0ZWQuIEhlYWQgb3ZlciB0byB0aGUgbmV4dCBleGVyY2lzZS5c Iik7IiwiaGludCI6IlxuWW91ciBgYGRhdGEuZnJhbWUoKWBgIGNhbGwgc3Rh cnRzIGFzIGZvbGxvd3M6XG5cbmBgYFxuZGF0YS5mcmFtZShwbGFuZXRzLCB0 eXBlLCBkaWFtZXRlcilcbmBgYCJ9


If you want to learn more from this course, here is the link.

Want to leave a comment?