Skip to content

Course Notes

Use this workspace to take notes, store code snippets, or build your own interactive cheatsheet! For courses that use data, the datasets will be available in the datasets folder.

# Import any packages you want to use here

Take Notes

Add notes here about the concepts you've learned and code cells with code you want to keep.

R works with numerous data types. Some of the most basic types to get started are:

Decimal values like 4.5 are called numerics. Whole numbers like 4 are called integers. Integers are also numerics. Boolean values (TRUE or FALSE) are called logical. Text (or string) values are called characters. Note how the quotation marks in the editor indicate that "some text" is a string.

Do you remember that when you added 5 + "six", you got an error due to a mismatch in data types? You can avoid such embarrassing situations by checking the data type of a variable beforehand. You can do this with the class() function, as the code in the editor shows.

In R, you create a vector with the combine function c(). You place the vector elements separated by a comma between the parentheses. For example:

numeric_vector <- c(1, 2, 3) character_vector <- c("a", "b", "c") Once you have created these vectors in R, you can use them to do calculations.

You can give a name to the elements of a vector with the names() function. Have a look at this example:

some_vector <- c("John Doe", "poker player") names(some_vector) <- c("Name", "Profession") This code first creates a vector some_vector and then gives the two elements a name. The first element is assigned the name Name, while the second element is labeled Profession. Printing the contents to the console yields following output:

Name Profession "John Doe" "poker player"

-

You want to find out the following type of information:

How much has been your overall profit or loss per day of the week? Have you lost money over the week in total? Are you winning/losing money on poker or on roulette? To get the answers, you have to do arithmetic calculations on vectors.

It is important to know that if you sum two vectors in R, it takes the element-wise sum. For example, the following three statements are completely equivalent:

c(1, 2, 3) + c(4, 5, 6) c(1 + 4, 2 + 5, 3 + 6) c(5, 7, 9) You can also do the calculations with variables that represent vectors:

a <- c(1, 2, 3) b <- c(4, 5, 6) c <- a + b

# Poker and roulette winnings from Monday to Friday:

poker_vector <- c(140, -50, 20, -120, 240)

roulette_vector <- c(-24, -50, 100, -350, 10)

days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")

names(poker_vector) <- days_vector

names(roulette_vector) <- days_vector

# Calculate total gains for poker and roulette

total_poker <- sum(poker_vector )

total_roulette <- sum(roulette_vector)

# Check if you realized higher total gains in poker than in roulette

total_poker > total_roulette

Vector selection: the good times Your hunch seemed to be right. It appears that the poker game is more your cup of tea than roulette.

Another possible route for investigation is your performance at the beginning of the working week compared to the end of it. You did have a couple of Margarita cocktails at the end of the week…

To answer that question, you only want to focus on a selection of the total_vector. In other words, our goal is to select specific elements of the vector. To select elements of a vector (and later matrices, data frames, …), you can use square brackets. Between the square brackets, you indicate what elements to select. For example, to select the first element of the vector, you type poker_vector[1]. To select the second element of the vector, you type poker_vector[2], etc. Notice that the first element in a vector has index 1, not 0 as in many other programming languages.

# Poker and roulette winnings from Monday to Friday:

poker_vector <- c(140, -50, 20, -120, 240)

roulette_vector <- c(-24, -50, 100, -350, 10)

days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")

names(poker_vector) <- days_vector

names(roulette_vector) <- days_vector

# Define a new variable based on a selection

poker_wednesday <- poker_vector[3]

Vector selection: the good times (4) Another way to tackle the previous exercise is by using the names of the vector elements (Monday, Tuesday, …) instead of their numeric positions. For example,

poker_vector["Monday"] will select the first element of poker_vector since "Monday" is the name of that first element.

Just like you did in the previous exercise with numerics, you can also use the element names to select multiple elements, for example:

poker_vector[c("Monday","Tuesday")]

# Poker and roulette winnings from Monday to Friday:

poker_vector <- c(140, -50, 20, -120, 240)

roulette_vector <- c(-24, -50, 100, -350, 10)

days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")

names(poker_vector) <- days_vector

names(roulette_vector) <- days_vector

# Select poker results for Monday, Tuesday and Wednesday

poker_start <-poker_vector[c("Monday", "Tuesday", "Wednesday")]

# Calculate the average of the elements in poker_start

mean(poker_start)

Selection by comparison - Step 1 By making use of comparison operators, we can approach the previous question in a more proactive way.

The (logical) comparison operators known to R are:

< for less than

for greater than

<= for less than or equal to

= for greater than or equal to

== for equal to each other

!= not equal to each other

As seen in the previous chapter, stating 6 > 5 returns TRUE. The nice thing about R is that you can use these comparison operators also on vectors. For example:

c(4, 5, 6) > 5 [1] FALSE FALSE TRUE This command tests for every element of the vector if the condition stated by the comparison operator is TRUE or FALSE.

# Poker and roulette winnings from Monday to Friday:
poker_vector <- c(140, -50, 20, -120, 240)
roulette_vector <- c(-24, -50, 100, -350, 10)
days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")

names(poker_vector) <- days_vector
names(roulette_vector) <- days_vector

# Which days did you make money on poker?
selection_vector <- poker_vector > 0

# Print out selection_vector
selection_vector

What's a matrix?

In R, a matrix is a collection of elements of the same data type (numeric, character, or logical) arranged into a fixed number of rows and columns. Since you are only working with rows and columns, a matrix is called two-dimensional.

You can construct a matrix in R with the matrix() function. Consider the following example:

matrix(1:9, byrow = TRUE, nrow = 3)

In the matrix() function:

The first argument is the collection of elements that R will arrange into the rows and columns of the matrix. Here, we use 1:9 which is a shortcut for c(1, 2, 3, 4, 5, 6, 7, 8, 9).

The argument byrow indicates that the matrix is filled by the rows. If we want the matrix to be filled by the columns, we just place byrow = FALSE.

The third argument nrow indicates that the matrix should have three rows.

# all_wars_matrix is available in your workspace all_wars_matrix

# Select the non-US revenue for all movies non_us_all <- all_wars_matrix[,2]

# Average non-US revenue mean(non_us_all)

# Select the non-US revenue for first two movies non_us_some <- all_wars_matrix[1:2,2]

# Average non-US revenue for first two movies mean(non_us_some)