Course: Introduction to R
Chapter 1: Intro to Basics
- R does normal arithmetic using the standard operations: +, -, *, /, ^, %%, where the last one is modulo arithmetic.
- You assign variables by using backwards arrows (my_variable <- 5, not my_variable = 5)
- You can check the data type by using the class() function.
Chapter 2: Vectors
- To create a vector, use the combine function c().
- The names() function is used to give names to a vector. See below for an example.
fibonacci_vector <- c(1, 1, 2, 3, 5)
fibonacci_names <- c("F_1", "F_2", "F_3", "F_4", "F_5")
names(fibonacci_vector) <- fibonacci_names
fibonacci_vector- To find the sum of a vector, use the sum() function.
- Check out the na.rm note regarding how to handle non-numeric types.
- Use brackets to call out only specific entries in a vector. For example, my_vector <- c(2, 4, 6, 8, 10) then my_vector[2] will give the 2nd element in the vector, which is 4. (Counting starts at 1, not 0.)
- Using a:b in the bracket includes both the a_th and b_th index, and everything in between.
- The mean() function calculates the average value of the vector.
Selection by Comparison:
- You can use vectors to choose only the specific values you want, based on certain conditions. See the example below:
fibonacci_vector >= 2
fibonacci_vector[fibonacci_vector >= 2]In the above example, the code inside the brackets says "give me the indices where the entries are greater than or equal to 2". Then wrapping that in the vector, it then prints only the values for which the indices were given from the inside. (UPDATE: I've included the code to show what the inside part does. Then the outside vector gives the values for which the index is TRUE.)
Chapter 3: Matrices
To create a matrix, use the matrix() function. It requires the arguments "data" (which is the data that will fill the matrix), "nrow" (or "ncol", which tells the number of rows, or columns you want the matrix to have), "byrow" (takes TRUE or FALSE as input, TRUE for rows, FALSE for columns), and can include the argument "dimnames" (row names, then column names).
For matrices, you can use rownames() and colnames() to name the rows and columns, respectively.
To sum over the rows of a matrix, use rowSums(). The documentation here also discusses colSums(), colMeans(), rowMeans(), and how to use for dataframes (later). The na.rm part of the function is helpful to remember.
If you need to add a column (or row), use the cbind() (or rbind()) function. This is helpful if you want to include a totals row or column.
To learn more about the R workspace, read more here.
Chapter 4: Factors
A factor is a categorical variable that holds a limited number of categories. A continuous variable can hold an infinite number of values, while a factor can hold a limited number of categories.
To convert a vector to a factor, use the factor() function.
Factors can be either a nominal categorical variable or an ordinal categorical variable. Nominal means there is no implied order, while ordinal has an implied order. Ordinal requires the arguments order = TRUE and levels = (levels go here).
Factor Levels:
The function levels() allows to assign the levels from another vector (factor?). It can also be used to replace the levels previously set. However, be careful when replacing as the levels currently set will be replaced with the same order as the new levels set.
The function summary() gives a summary of the object argument. See example below:
# Build factor_survey_vector with clean levels
survey_vector <- c("M", "F", "F", "M", "M")
factor_survey_vector <- factor(survey_vector)
levels(factor_survey_vector) <- c("Female", "Male")
factor_survey_vector
# Generate summary for survey_vector
summary(survey_vector)
# Generate summary for factor_survey_vector
summary(factor_survey_vector)Chapter 5: Data frames
Data frames can house data of different types, whereas a matrix must be of the same type throughout.
If a data frame is too large, you can choose to see the top (or bottom) few rows by using the head() (or tail()) function.
Structure of an Object:
To see the structure of an object, use the str() function.
Creating a Data frame:
To create a dataframe, use the function data.frame(). Entering in the data vectors in the order of the columns you want, you can also include certain arguments. See the optional arguments in the link.
You can call columns by using the column name. You can do this by using brackets and the column name. You can also call a column by using the shortcut data_frame$column_name.
name <- c("Mercury", "Venus", "Earth",
"Mars", "Jupiter", "Saturn",
"Uranus", "Neptune")
type <- c("Terrestrial planet",
"Terrestrial planet",
"Terrestrial planet",
"Terrestrial planet", "Gas giant",
"Gas giant", "Gas giant", "Gas giant")
diameter <- c(0.382, 0.949, 1, 0.532,
11.209, 9.449, 4.007, 3.883)
rotation <- c(58.64, -243.02, 1, 1.03,
0.41, 0.43, -0.72, 0.67)
rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE)
# Create a data frame from the vectors
planets_df <- data.frame(name, type, diameter, rotation, rings)
planets_df
planets_df[,"name"]
planets_df$nameNote from course: The example with rings_vector only worked because that column was already all TRUE/FALSE. You can pull all rows that are just Gas giants like this:
planets_df_gas <- planets_df$type == "Gas giant"
planets_df[planets_df_gas,]Subsets:
Subsets are a way to pull a subset of a data frame.
subset(planets_df, subset = rings)
subset(planets_df, subset = diameter < 1)The first example above really reads as subset(planets_df, subset = rings == TRUE) since that column only has TRUE/FALSE values.
Sorting:
The order() function allows you to sort objects. For whatever column you pick, it'll return a list of natural numbers for which the order would be based on that column. You can then use that set of naturals as the index if you use that as the argument for the data frame. See example below:
planets_df_rotation_order <- order(planets_df$rotation)
planets_df_rotation_order
planets_df[planets_df_rotation_order,] #This is smallest to largest rotation order.
planets_df_diameter <- order(planets_df$diameter, decreasing = TRUE) #This is largest to smallest diameter order.
planets_df[planets_df_diameter,]