Skip to content

Course Notes

Use this workspace to take notes, store code snippets, or build your own interactive cheatsheet! For courses that use data, the datasets will be available in the datasets folder.

# Import any packages you want to use here

Take Notes

Add notes here about the concepts you've learned and code cells with code you want to keep.

Add your notes **here**

Add your code snippets here

#n() = gives a result of the group size. #str() = details the structure of a data.frame #ls() = lists any objects within some list or in the environment #sort() = sorts a vector in ascending order. #rev() = reverses the elements of a vector. #append = merge vectors or lists. #seq() = generate sequences, by specifying from, to, and by (i.e., seq( from = 1, to = 10, by = 3); output = 1,4,7,10) #is.() = check a class of an R object. #as.() = convert an R object from one class to another. #data.frame() = composed of vectors of a same variable with the same amount of rows #if() = is a condition function that verifies if a condition is met within a vector #else() = a function that tells R that if the first condition is not met than the next piece of code should be taken into account. #while() = tells R that while a certain element within a vector meets a requirement the following code should be taken into account. #break() = has no parenthesis and is put at the beginning to stop a while loop if a certain condition is met. #c() = combine elements within a vector #nrow = counting rows #ncol = counting columns #nchar() = counts the number of characters within an element or elements on a vector dependent on his it is coded. #"&" = takes into account the previous and next commands relative to the "&" symbol. #"!" = a negator symbol (for example "!=" means is not equal to) #">= or <=" = is greater or equal to and vice-versa #bracket[] = is to call a particular element from a vector and brackets[[]] = to call a particular element from a list #curly bracket{} = permits to write a command code after a function such as an if() has been called #function() = a function to create functions #"= vs ==" = a designator sign, such as when using "<-" vs the "is equal to" comparison. #print() = prints whatever is within a vector. #";" = (at least in datacamp) equals to jumping down to the next line #paste = similar to concatenate in EXCEL, but it concatenates vectors after they are converted to characters #(at least in datacamp) you have to highlight the entire code to run it or hit the "Run Code" button #next = instead of breaking a loop such as in the break function, "next" will jump the element of a vector when such condition is not met #for() = loops contain the "in" call and can call within the vector elements one by one (you can have another for() loop within another) #args() = a function that can detail the arguments of a function that is within the parenthesis #na.rm = a logical argument that indicates if to take "NA" into account. (TRUE = eliminates or does not account for any "NA's") #rbind() = combine a vector as a row to a data frame. #one can also add columns by taking the data frame and creating the new column name using the "GPA <- c(4.0, 3.5, 3.2, 3.15, 2.9, 3.0). to remove a column you indicate which column and use the following: students_dataGPA <- NULL #abs() = it returns the absoulte value of a numeric value or calculation. #lapply() = can help in applying a function to a list or a vector easily without complicated code (using object & function calls). lapply returns its results in a list. When calling another function within lapply(x, FUN) do not add parenthesis to FUN. #unlist() = can be used to return results as a vector instead of list [i.e., unlist(lapply(cities,nchar))]. #sapply() = sapply functions in the same way as lapply but returns the result in a vector instead of a list. #within both sapply and lapply I can use built-in and my own functions. However, sapply is more practical in the sense that it returns a vector and can even create a matrix of two vectors when lapply does not. #Also, when length of vectors is not the same within a list that is being used as an object inside of sapply function, it will be the same as performing such method with lapply. #vapply() = basically I can create a matrix from the list of vectors and applies data to the strict vectors they would belong to dependent on how the function was coded (in this case a defined function). #Also, the specific set of numeric (i.e., numeric(#)) must be called in order to give the precise amount of rows in the matrix. This is dependent on the vector length in which the function applied has called. #strsplit() = a function that takes elements of a vector and splits and transposes their values into a list (similar to transpose in EXCEL) #tolower() = changing string characters to lower letters. #grepl() = can be used to indicates TRUE when a pattern within a character string of a vector has been recognized. #grep() = would indicate the element position of the grepl function. #Also, you can create a vector that subset the indexes of the elements in a vector which can then be used to call upon such subsetted indexes within brackets to that vector. #which() = if we use which around grepl we would get the same result as grep. #sub() = would basically substitute a letter from the start "^a" or end "a" of a character string of a vector. This would only do so for one of the letters of the character string and not all. #gsub() = would do the same as sub() but for all letters in the character string. #Sys.Date() = gives back the system current date ("2023-05-31"). Be sure to have the date format similar to as.Date or how it was formatted.

#as.Date = can be used to convert a vector of elements to the date class. Can also be used to create a string as a date. You can calculate the dates with 1 increase being an increment in 1 day, if formatted accordingly. (stored as days since 1/1/1970) #Sys.time() = gives back the data and time with standard zone time ("2023-05-31" 3:07:00 AM CEST). #as.POSIXct() = change the format of a date. You can do calculations with these date formats. However, each 1 increase is an increment in time of 1 second instead day as in the as.Date function. (stored as seconds since 1/1/1970) #packages to deal with dates in an advanced manner: #lubridate() = #zoo() = #xts() = #select() = selecting which variables we want to use from a data set. An example would like creating the data set for each of my dissertation studies. #glimpse() = similar to head(), but looks at the entire data set in a rows manner (meaning that each variable will be each row). #count() = counts the number of observations for a specific row or vector. We can use a sort (i.e., sort = TRUE) or wt (i.e., wt = variable) to indicate how to look at the data. When using the count() function, one can have two variables/columns to count. For example, count(variable1, variable2). This distinguishes between counting individual observations for one and actually counts for the total combinations of that pairing. #slice_max() = takes the extreme max value of a variable according to the specifications given prior to it. #slice_min() = same but for the min value. #select() = within this function we are specifying which columns to use. For example, select(variable, variable2:variable10) #contain() = a function to include columns that have a specified criteria. For example, select(state, county, contain("word"). #start_with() = a function to include columns that start with a specified criteria. For example, select(state, county, start_with("word")) #end_with() = a function to include columns that end with a specified criteria. For example, select(state, county, end_with("word")) #last_col() = a function to include the last column. #matches() = a function to match a specific column. #removing a column can be done by putting a " - " before a particular variable/column. For example, select(-"variable"). #rename() = substitute a variable name with new name. It can be done within the select function by just calling upon it with a new name. For example, select(state, county, unemployment_rate = unemployment). But the rename function alone has to include only the changing variable name. #transmute() = performs the same functions as mutate and select at the same time. #%in% = this operator is used within the filter function to search for certain observations contained within a vector. For Example, filter(column_name %in% c("specific_observation1", "specific_observation2", "specific_observation3"). #lag() = it takes a current vector and returns a vector of the same but each observation is moved one place to the right. For example, v <- c(1,2,4, 8), then, v - lag(v) = NA,1,2,4. This is good for calculating changes from consecutive values. #inner_join() = is a function verb that combines one data table with another by a column. We can join by calling which columns such as follows, inner_join("data_set", by = "column_name"). If you want to specify suffixes from each data set, then do as follows, inner_join("data_set", by = c("column_name" = "other_column_name"), suffix = c("_x_column_name", "_y_column_name"). #left_join() = uses the same functions as inner_join, but with the benefit of keeping the data from the left table while adding the data of the second despite it not having the exact number of observations. So, for example, there will be "NA" appearing within the newly created table. #right_join() = the same as left_join() but reverse (i.e., keeps all observations from the right table). #full_join() = the same as left and right, but it uses all the data from both tables. There will be "NA's" within both new columns. #semi_join() = is a verb function that returns a table which holds sets that are contained within both tables. #anti_join() = is a verb function that returns a table which holds sets that are not contained within both tables. This verb function is a good function to find data quality issues. For example, mispelled names may increase differenct observation categories when in fact it was just an error. #within filter() function a is.na can be used to capture only those observations that contain "NA" within the indicated column. #replace_na() = is a function to replace "NA" observations with zero if this observation in particular should be zero. #as.integer() = changes a date to integer format. For example, 2014-03-21 will be identified as its respective integer. #bind_rows() =

STATS quantile() = gives you the quartiles (i.e., quantile(variable1-10) = 0%= 0, 25% = 2.5, 50% = 5, 75% = 7.5, 100% = 10). However, you can adjust the sequences by adding arguments within the function or by using the seq() function (i.e., quantile(variable1- 10, probs= c(0, 0.2, .04, 0.6, 0.8, 1) = 0%, 20%, 40%, etc...; or seq(from, to, by). #facet_wrap() = creates multiple graphs within one picture by a category (i.e., ...facet_wrap( food_category)) #OUTLIERS in a boxplot are calculated as follows: Q1 - 1.5 * IQR or Q1 + 1.5 * IQR for lower and upper, respectively (box whiskers). #rnorm() = generate a set of random numbers for a variable (i.e., df <- variable_1 = rnorm(10), variable_2 = rnorm(10) will generate a data frame of two variables containing 10 observations each.) #set.seed(#) = is used to set a randomly generated set of values to always occur the same way. Make sure to always use the same number (i.e., set.seed(1) i no the same as set.seed(2)). #sample_n() = generate one value within the sample. By default, replacement is not true.