Skip to content

Course Notes

Use this workspace to take notes, store code snippets, or build your own interactive cheatsheet! The datasets used in this course are available in the datasets folder.

# Import any packages you want to use here

The order in which you perform your mathematical operations is critical to get the correct answer. The correct sequence of "order of operation" is:

Parenthesis, Exponentiation, Multiplication and Division, Addition and Subtraction

Or PEMDAS for short!

105 = 100 * 1.05

Or in terms of variables:

post_jan_cash <- starting_cash * jan_mult

A quick way to get the multiplier is:

multiplier = 1 + (return / 100)

**To get started, here are some of R's most basic data types:

Numerics are decimal numbers like 4.5. A special type of numeric is an integer, which is a numeric without a decimal piece. Integers must be specified like 4L. Logicals are the boolean values TRUE and FALSE. Capital letters are important here; true and false are not valid. Characters are text values like "hello world".**

Up until now, you have been determining what data type a variable is just by looks. There is actually a better way to check this.

class(my_var)

In R, ls() is a function that lists the objects (e.g. variables, functions, datasets) in the current working environment. When you call the ls() function with no arguments, it will return a character vector of the names of all objects in the current workspace.

vector is a collection of objects of the same data type. Remember, you create a vector using the combine function, c(), and each element you add is separated by a comma.

For example, this is a vector of Apple's stock prices from December, 2016:

apple_stock <- c(109.49, 109.90, 109.11, 109.95, 111.03, 112.12)

And this is a character vector of bond credit ratings:

credit_rating <- c("AAA", "AA", "BBB", "BB", "B")

A logical vector

logic <- c( TRUE, FALSE, TRUE)

The hierarchy for coercion is:

logical < integer < numeric < character

Inside of plot(), you can change the type of your graph using type =. The default is "p" for points, but you can also change it to "l" for line.

Here is the 12 month return vector:

ret <- c(5, 2, 3, 7, 8, 3, 5, 9, 1, 4, 6, 3)

Select the first month: ret[1].

Select the first month by name: ret["Jan"].

Select the first three months: ret[1:3] or ret[c(1, 2, 3)]

Correlation is a measure of association between two things and is represented by a number from -1 to 1

A 1 represents perfect positive correlation, a -1 represents perfect negative correlation, and 0 correlation means that the stocks move independently of each other

Data frames are great because of their ability to hold a different type of data in each column.

To select the entire first row, leave the col empty: stocks[1, ]

To select the first two rows: stocks[1:2, ] or stocks[c(1,2), ]

To select an entire column, leave the row empty: stocks[, 1]

You can also select an entire column by name: stocks[, "apple"]

data.frame(company = c("A", "A", "B"), cash_flow = c(100, 200, 300), year = c(1, 3, 2)) company <- c("A", "A", "B") cash_flow <- c(100, 200, 300) year <- c(1, 3, 2)

data.frame(company, cash_flow, year)

head() - Returns the first few rows of a data frame. By default, 6. To change this, use head(cash, n = ___) tail() - Returns the last few rows of a data frame. By default, 6. To change this, use tail(cash, n = ___) str() - Check the structure of an object. This fantastic function will show you the data type of the object you pass in (here, data.frame), and will list each column variable along with its data type.

cash$cash_flow is the same as cash[,"cash_flow"]

You can delete a column by assigning it NULL subset(cash, company == "A")

Creating new columns in your data frame is as simple as assigning the new information to data_frame$new_column

present_value <- cash_flow * (1 + interest / 100) ^ -year

Note that in the example below, you are creating the plot from a factor and not a character vector. R will throw an error if you try and plot a character vector!

if list was not created...

my_list <- list(my_words = words, my_numbers = numbers)

or use this if list was already created...

my_list <- list(words, numbers) names(my_list) <- c("my_words", "my_numbers")

To access the elements in the list, use [ ]. This will always return another list.

my_list[1]

$my_words [1] "I <3 R"

my_list[c(1,2)]

$my_words [1] "I <3 R"

$my_numbers [1] 42 24 To pull out the data inside each element of your list, use [[ ]].

my_list[[1]]

[1] "I <3 R" If your list is named, you can use the my_words. This is the same as using [[ ]] to return the inner data.

Add weight: 20% Apple, 80% IBM

portfolio$weight <- c(apple = 0.2, ibm = 0.8)

to remove from list

my_list$dans_movie <- NULL

If your list is not named, you can also remove elements by position using my_list[1] <- NULL or my_list[[1]] <- NULL

Create a grouping to split on, and use split() to create a list of two data frames.

grouping <- cash$company split_cash <- split(cash, grouping)

split_cash

$A company cash_flow year 1 A 1000 1 2 A 4000 3 3 A 550 4

$B company cash_flow year 4 B 1500 1 5 B 1100 2 6 B 750 4 7 B 6000 5 To get your original data frame back, use unsplit(split_cash, grouping).

Take Notes

Add notes here about the concepts you've learned and code cells with code you want to keep.

Add your notes here

# Add your code snippets here
# Starting cash and returns 
starting_cash <- 200
jan_ret <- 4
feb_ret <- 5

# Multipliers
jan_mult <- 1 + (jan_ret/ 100)
feb_mult <- 1 + (feb_ret/ 100)

# Total cash at the end of the two months
total_cash <- starting_cash * jan_mult * feb_mult

# Print total_cash
total_cash
# Weights, returns, and company names
ret <- c(7, 9)
weight <- c(.2, .8)
companies <- c("Microsoft", "Sony")

# Assign company names to your vectors
names(ret) <- companies
names(weight) <- companies

# Multiply the returns and weights together 
ret_X_weight <- ret * weight

# Print ret_X_weight
ret_X_weight

# Sum to get the total portfolio return
portf_ret <- sum(ret_X_weight)

# Print portf_ret
portf_ret

my_vector <- c(2, 3, 4, 5)

matrix(data = my_vector, nrow = 2, ncol = 2)

# Correlation of Apple and IBM
cor(apple, ibm)

# stock matrix
stocks <- cbind(apple, micr, ibm)
stocks
# cor() of all three
cor(stocks)

colnames(cash) <- c("company", "cash_flow", "year")
Select the first row: cash[1, ]

Select the first column: cash[ ,1]

Select the first column by name: cash[ ,"company"]

# Present value of $4000, in 3 years, at 5%
present_value_4k <- 4000 * 1.05 ^-3

# Present value of all cash flows
cash$present_value <- cash$cash_flow * (1 + 5/100)^ -cash$year

# Print out cash
cash

cash_A <- subset(cash, company == "A")

sum(cash_A$present_value)

# Total present value of cash
total_pv <- sum(cash$present_value)

# Company B information
cash_B <- subset(cash, company == "B")

# Total present value of cash_B
total_pv_B <- sum(cash_B$present_value)
total_pv_B

# credit_rating character vector
credit_rating <- c("BB", "AAA", "AA", "CCC", "AA", "AAA", "B", "BB")

# Create a factor from credit_rating
credit_factor <- factor(credit_rating)

# Print out your new factor
credit_factor

# Call str() on credit_rating

str(credit_rating)
# Call str() on credit_factor
str(credit_factor)


# Identify unique levels
levels(credit_factor)

# Rename the levels of credit_factor
levels(credit_factor) <- c("2A", "3A", "1B", "2B", "3C")

# Print credit_factor
credit_factor

# Summarize the character vector, credit_rating
summary(credit_rating)

# Summarize the factor, credit_factor
summary (credit_factor)

# Create 4 buckets for AAA_rank using cut()
AAA_factor <- cut(x = AAA_rank, breaks = c(0, 25, 50, 75, 100))

# Rename the levels 
levels(AAA_factor) <- c("low", "medium", "high", "very_high")

# Print AAA_factor
AAA_factor

# Plot AAA_factor
plot(AAA_factor)

# Use unique() to find unique words
unique(credit_rating)

# Create an ordered factor
credit_factor_ordered <- factor(credit_rating, ordered = TRUE, levels = c("AAA", "AA", "BB", "B", "CCC"))

# Plot credit_factor_ordered
plot(credit_factor_ordered)


credit_factor[-1, drop = TRUE]

[1] AA  A   BBB AA  BBB A  
Levels: BBB < A < AA

# Remove the A bonds at positions 3 and 7. Don't drop the A level.
keep_level <- credit_factor [-c(3,7)]
credit_factor
# Plot keep_level
plot(keep_level)

# Remove the A bonds at positions 3 and 7. Drop the A level.
drop_level <- credit_factor[-c(3,7),  drop = TRUE]

# Plot drop_level
plot(drop_level)

# Variables
credit_rating <- c("AAA", "A", "BB")
bond_owners <- c("Dan", "Tom", "Joe")

# Create the data frame of character vectors, bonds
bonds <- data.frame(credit_rating, bond_owners, stringsAsFactors = FALSE)

# Use str() on bonds
str(bonds)

# Create a factor column in bonds called credit_factor from credit_rating
bonds$credit_factor <- factor(bonds$credit_rating, ordered = TRUE, levels = c("AAA", "A", "BB"))

# Use str() on bonds again
str(bonds)

#for split-apply-combine

grouping <- cash$company
split_cash <- split(cash, grouping)

# We can access each list element's cash_flow column by:
split_cash$A$cash_flow
[1] 1000 4000  550

split_cash$A$cash_flow <- split_cash$A$cash_flow * 2
split_cash$B$cash_flow <- split_cash$B$cash_flow * 3

new_cash <- unsplit(split_cash, grouping)