If you want to take our Introduction to R course, here is the link.
What's a data frame?
You may remember from the chapter about matrices that all the elements that you put in a matrix should be of the same type. Back then, your data set on Star Wars only contained numeric elements.
When doing a market research survey, however, you often have questions such as:
- 'Are you married?' or 'yes/no' questions (
logical
)
- 'How old are you?' (
numeric
)
- 'What is your opinion on this product?' or other 'open-ended' questions (
character
)
- ...
The output, namely the respondents' answers to the questions formulated above, is a data set of different data types. You will often find yourself working with data sets that contain different data types instead of only one.
A data frame has the variables of a data set as columns and the observations as rows. This will be a familiar concept for those coming from different statistical software packages such as SAS or SPSS.
Instructions
Click 'Submit Answer'. The data from the built-in example data frame mtcars
will be printed to the console.
eyJsYW5ndWFnZSI6InIgIiwicHJlX2V4ZXJjaXNlX2NvZGUiOiJcbiIsInNh
bXBsZSI6IlxuIyBQcmludCBvdXQgYnVpbHQtaW4gUiBkYXRhIGZyYW1lXG5t
dGNhcnMgIiwic29sdXRpb24iOiJcbiMgUHJpbnQgb3V0IGJ1aWx0LWluIFIg
ZGF0YSBmcmFtZVxubXRjYXJzICIsInNjdCI6IlxudGVzdF9vdXRwdXRfY29u
dGFpbnMoXCJtdGNhcnNcIiwgaW5jb3JyZWN0X21zZyA9IFwiRG8gbm90IGNo
YW5nZSBhbnl0aGluZyBhYm91dCB0aGUgY29kZSwgTWFrZSBzdXJlIHRoYXQg
eW91IG91dHB1dCBgbXRjYXJzYC5cIilcbnN1Y2Nlc3NfbXNnKFwiR3JlYXQh
IENvbnRpbnVlIHRvIHRoZSBuZXh0IGV4ZXJjaXNlLlwiKSIsImhpbnQiOiJc
bkp1c3QgY2xpY2sgJ1N1Ym1pdCBBbnN3ZXInIGFuZCB3aXRuZXNzIHRoZSBt
YWdpYyEifQ==
Quick, have a look at your data set
Wow, that is a lot of cars!
Working with large data sets is not uncommon in data analysis. When you work with (extremely) large data sets and data frames, your first task as a data analyst is to develop a clear understanding of its structure and main elements. Therefore, it is often useful to show only a small part of the entire data set.
So how to do this in R? Well, the function head()
enables you to show the first observations of a data frame. Similarly, the function tail()
prints out the last observations in your data set.
Both head()
and tail()
print a top line called the 'header', which contains the names of the different variables in your data set.
Instructions
Call head()
on the mtcars
data set to have a look at the header and the first observations.
eyJsYW5ndWFnZSI6InIgIiwicHJlX2V4ZXJjaXNlX2NvZGUiOiJcbiIsInNh
bXBsZSI6IlxuIyBDYWxsIGhlYWQoKSBvbiBtdGNhcnNcbiIsInNvbHV0aW9u
IjoiXG4jIENhbGwgaGVhZCgpIG9uIG10Y2Fyc1xuaGVhZChtdGNhcnMpIiwi
c2N0IjoiXG50ZXN0X2Z1bmN0aW9uKFwiaGVhZFwiLCBcInhcIiwgaW5jb3Jy
ZWN0X21zZyA9IFwiSGF2ZSB5b3UgY29ycmVjdGx5IHBhc3NlZCBgbXRjYXJz
YCB0byB0aGUgYGhlYWQoKWAgZnVuY3Rpb24/XCIpXG50ZXN0X291dHB1dF9j
b250YWlucyhcImhlYWQobXRjYXJzKVwiLCBpbmNvcnJlY3RfbXNnID0gXCJT
aW1wbHkgcHJpbnQgb3V0IHRoZSByZXN1bHQgb2YgdGhlIGBoZWFkKClgIGNh
bGwsIG5vIG5lZWQgdG8gYXNzaWduIGl0IHRvIGEgbmV3IHZhcmlhYmxlLlwi
KVxuc3VjY2Vzc19tc2coXCJXb25kZXJmdWwhIFNvLCB3aGF0IGRvIHdlIGhh
dmUgaW4gdGhpcyBkYXRhIHNldD8gRm9yIGV4YW1wbGUsIGBocGAgcmVwcmVz
ZW50cyB0aGUgY2FyJ3MgaG9yc2Vwb3dlcjsgdGhlIERhdHN1biBoYXMgdGhl
IGxvd2VzdCBob3JzZSBwb3dlciBvZiB0aGUgNiBjYXJzIHRoYXQgYXJlIGRp
c3BsYXllZC4gRm9yIGEgZnVsbCBvdmVydmlldyBvZiB0aGUgdmFyaWFibGVz
JyBtZWFuaW5nLCB0eXBlIGA/bXRjYXJzYCBpbiB0aGUgY29uc29sZSBhbmQg
cmVhZCB0aGUgaGVscCBwYWdlLiBDb250aW51ZSB0byB0aGUgbmV4dCBleGVy
Y2lzZSFcIik7IiwiaGludCI6IlxuYGBoZWFkKG10Y2FycylgYCB3aWxsIHNo
b3cgdGhlIGZpcnN0IG9ic2VydmF0aW9ucyBvZiB0aGUgYG10Y2Fyc2AgZGF0
YSBmcmFtZS4ifQ==
Have a look at the structure
Another method that is often used to get a rapid overview of your data is the function str()
. The function str()
shows you the structure of your data set. For a data frame it tells you:
The total number of observations (e.g. 32 car types)
The total number of variables (e.g. 11 car features)
A full list of the variables names (e.g. mpg
, cyl
... )
The data type of each variable (e.g. num
)
The first observations
Applying the str()
function will often be the first thing that you do when receiving a new data set or data frame. It is a great way to get more insight in your data set before diving into the real analysis.
Instructions
Investigate the structure of mtcars
. Make sure that you see the same numbers, variables and data types as mentioned above.
eyJsYW5ndWFnZSI6InIgIiwicHJlX2V4ZXJjaXNlX2NvZGUiOiJcbiIsInNh
bXBsZSI6IlxuIyBJbnZlc3RpZ2F0ZSB0aGUgc3RydWN0dXJlIG9mIG10Y2Fy
cyIsInNvbHV0aW9uIjoiXG4jIEludmVzdGlnYXRlIHRoZSBzdHJ1Y3R1cmUg
b2YgbXRjYXJzXG5zdHIobXRjYXJzKVxuIiwic2N0IjoiXG50ZXN0X291dHB1
dF9jb250YWlucyhcInN0cihtdGNhcnMpXCIsIGluY29ycmVjdF9tc2cgPSBc
IkhhdmUgeW91IGNvcnJlY3RseSBjYWxsZWQgYHN0cigpYCBvbiBgbXRjYXJz
YD9cIilcbnN1Y2Nlc3NfbXNnKFwiTmljZSB3b3JrISBDYW4geW91IGZpbmQg
YWxsIHRoZSBpbmZvcm1hdGlvbiB0aGF0IGlzIGxpc3RlZCBpbiB0aGUgZXhl
cmNpc2UncyBhc3NpZ25tZW50PyBDb250aW51ZSB0byB0aGUgbmV4dCBleGVy
Y2lzZS5cIikiLCJoaW50IjoiXG5Vc2UgdGhlIGBgc3RyKClgYCBmdW5jdGlv
biBvbiBgbXRjYXJzYC4ifQ==
Creating a data frame
Since using built-in data sets is not even half the fun of creating your own data sets, the rest of this chapter is based on your personally developed data set. Put your jet pack on because it is time for some space exploration!
As a first goal, you want to construct a data frame that describes the main characteristics of eight planets in our solar system. According to your good friend Buzz, the main features of a planet are:
- The type of planet (Terrestrial or Gas Giant).
- The planet's diameter relative to the diameter of the Earth.
- The planet's rotation across the sun relative to that of the Earth.
- If the planet has rings or not (TRUE or FALSE).
After doing some high-quality research on Wikipedia, you feel confident enough to create the necessary vectors: name
, type
, diameter
, rotation
and rings
; these vectors have already been coded up on the right. The first element in each of these vectors correspond to the first observation.
You construct a data frame with the data.frame()
function. As arguments, you pass the vectors from before: they will become the different columns of your data frame. Because every column has the same length, the vectors you pass should also have the same length. But don't forget that it is possible (and likely) that they contain different types of data.
Instructions
Use the function data.frame()
to construct a data frame. Pass the vectors name
, type
, diameter
, rotation
and rings
as arguments to data.frame()
, in this order. Call the resulting data frame planets_df
.
eyJsYW5ndWFnZSI6InIgIiwicHJlX2V4ZXJjaXNlX2NvZGUiOiJcbiIsInNh
bXBsZSI6IlxuIyBEZWZpbml0aW9uIG9mIHZlY3RvcnNcbm5hbWUgPC0gYyhc
Ik1lcmN1cnlcIiwgXCJWZW51c1wiLCBcIkVhcnRoXCIsIFwiTWFyc1wiLCBc
Ikp1cGl0ZXJcIiwgXCJTYXR1cm5cIiwgXCJVcmFudXNcIiwgXCJOZXB0dW5l
XCIpXG50eXBlIDwtIGMoXCJUZXJyZXN0cmlhbCBwbGFuZXRcIiwgXCJUZXJy
ZXN0cmlhbCBwbGFuZXRcIiwgXCJUZXJyZXN0cmlhbCBwbGFuZXRcIiwgXG4g
ICAgICAgICAgXCJUZXJyZXN0cmlhbCBwbGFuZXRcIiwgXCJHYXMgZ2lhbnRc
IiwgXCJHYXMgZ2lhbnRcIiwgXCJHYXMgZ2lhbnRcIiwgXCJHYXMgZ2lhbnRc
IilcbmRpYW1ldGVyIDwtIGMoMC4zODIsIDAuOTQ5LCAxLCAwLjUzMiwgMTEu
MjA5LCA5LjQ0OSwgNC4wMDcsIDMuODgzKVxucm90YXRpb24gPC0gYyg1OC42
NCwgLTI0My4wMiwgMSwgMS4wMywgMC40MSwgMC40MywgLTAuNzIsIDAuNjcp
XG5yaW5ncyA8LSBjKEZBTFNFLCBGQUxTRSwgRkFMU0UsIEZBTFNFLCBUUlVF
LCBUUlVFLCBUUlVFLCBUUlVFKVxuXG4jIENyZWF0ZSBhIGRhdGEgZnJhbWUg
ZnJvbSB0aGUgdmVjdG9yc1xucGxhbmV0c19kZiA8LSIsInNvbHV0aW9uIjoi
XG4jIERlZmluaXRpb24gb2YgdmVjdG9yc1xubmFtZSA8LSBjKFwiTWVyY3Vy
eVwiLCBcIlZlbnVzXCIsIFwiRWFydGhcIiwgXCJNYXJzXCIsIFwiSnVwaXRl
clwiLCBcIlNhdHVyblwiLCBcIlVyYW51c1wiLCBcIk5lcHR1bmVcIilcbnR5
cGUgPC0gYyhcIlRlcnJlc3RyaWFsIHBsYW5ldFwiLCBcIlRlcnJlc3RyaWFs
IHBsYW5ldFwiLCBcIlRlcnJlc3RyaWFsIHBsYW5ldFwiLCBcbiAgICAgICAg
ICBcIlRlcnJlc3RyaWFsIHBsYW5ldFwiLCBcIkdhcyBnaWFudFwiLCBcIkdh
cyBnaWFudFwiLCBcIkdhcyBnaWFudFwiLCBcIkdhcyBnaWFudFwiKVxuZGlh
bWV0ZXIgPC0gYygwLjM4MiwgMC45NDksIDEsIDAuNTMyLCAxMS4yMDksIDku
NDQ5LCA0LjAwNywgMy44ODMpXG5yb3RhdGlvbiA8LSBjKDU4LjY0LCAtMjQz
LjAyLCAxLCAxLjAzLCAwLjQxLCAwLjQzLCAtMC43MiwgMC42NylcbnJpbmdz
IDwtIGMoRkFMU0UsIEZBTFNFLCBGQUxTRSwgRkFMU0UsIFRSVUUsIFRSVUUs
IFRSVUUsIFRSVUUpXG5cbiMgQ3JlYXRlIGEgZGF0YSBmcmFtZSBmcm9tIHRo
ZSB2ZWN0b3JzXG5wbGFuZXRzX2RmIDwtIGRhdGEuZnJhbWUobmFtZSwgdHlw
ZSwgZGlhbWV0ZXIsIHJvdGF0aW9uLCByaW5ncykiLCJzY3QiOiJcbm1zZyA9
IFwiRG8gbm90IGNoYW5nZSBhbnl0aGluZyBhYm91dCB0aGUgZGVmaW5pdGlv
biBvZiB0aGUgdmVjdG9ycy4gT25seSBhZGQgYSBgZGF0YS5mcmFtZSgpYCBj
YWxsIHRvIGNyZWF0ZSBgcGxhbmV0c19kZmAuXCJcbnRlc3Rfb2JqZWN0KFwi
bmFtZVwiLCB1bmRlZmluZWRfbXNnID0gbXNnLCBpbmNvcnJlY3RfbXNnID0g
bXNnKVxudGVzdF9vYmplY3QoXCJ0eXBlXCIsIHVuZGVmaW5lZF9tc2cgPSBt
c2csIGluY29ycmVjdF9tc2cgPSBtc2cpXG50ZXN0X29iamVjdChcImRpYW1l
dGVyXCIsIHVuZGVmaW5lZF9tc2cgPSBtc2csIGluY29ycmVjdF9tc2cgPSBt
c2cpXG50ZXN0X29iamVjdChcInJvdGF0aW9uXCIsIHVuZGVmaW5lZF9tc2cg
PSBtc2csIGluY29ycmVjdF9tc2cgPSBtc2cpXG50ZXN0X29iamVjdChcInJp
bmdzXCIsIHVuZGVmaW5lZF9tc2cgPSBtc2csIGluY29ycmVjdF9tc2cgPSBt
c2cpXG5cbnRlc3Rfb2JqZWN0KFwicGxhbmV0c19kZlwiLFxuICAgICAgICAg
ICAgaW5jb3JyZWN0X21zZyA9IFwiSGF2ZSB5b3UgY29ycmVjdGx5IGNhbGxl
ZCBgZGF0YS5mcmFtZSgpYCB0byBjcmVhdGUgYHBsYW5ldHNfZGZgLiBJbnNp
ZGUgYGRhdGEuZnJhbWUoKWAsIG1ha2Ugc3VyZSB0byBwYXNzIGFsbCB2ZWN0
b3JzIGluIHRoZSBjb3JyZWN0IG9yZGVyOiBgbmFtZWAsIGB0eXBlYCwgYGRp
YW1ldGVyYCwgYHJvdGF0aW9uYCBhbmQgZmluYWxseSBgcmluZ3NgLlwiKVxu
XG5zdWNjZXNzX21zZyhcIkdyZWF0IGpvYiEgQ29udGludWUgdG8gdGhlIG5l
eHQgZXhlcmNpc2UuIFRoZSBsb2dpY2FsIG5leHQgc3RlcCwgYXMgeW91IGtu
b3cgYnkgbm93LCBpcyBpbnNwZWN0aW5nIHRoZSBkYXRhIGZyYW1lIHlvdSBq
dXN0IGNyZWF0ZWQuIEhlYWQgb3ZlciB0byB0aGUgbmV4dCBleGVyY2lzZS5c
Iik7IiwiaGludCI6IlxuWW91ciBgYGRhdGEuZnJhbWUoKWBgIGNhbGwgc3Rh
cnRzIGFzIGZvbGxvd3M6XG5cbmBgYFxuZGF0YS5mcmFtZShwbGFuZXRzLCB0
eXBlLCBkaWFtZXRlcilcbmBgYCJ9
If you want to learn more from this course, here is the link.