Course Notes
Use this workspace to take notes, store code snippets, or build your own interactive cheatsheet! The datasets used in this course are available in the datasets folder.
# Import any packages you want to use here
The order in which you perform your mathematical operations is critical to get the correct answer. The correct sequence of "order of operation" is:
Parenthesis, Exponentiation, Multiplication and Division, Addition and Subtraction
Or PEMDAS for short!
105 = 100 * 1.05
Or in terms of variables:
post_jan_cash <- starting_cash * jan_mult
A quick way to get the multiplier is:
multiplier = 1 + (return / 100)
**To get started, here are some of R's most basic data types:
Numerics are decimal numbers like 4.5. A special type of numeric is an integer, which is a numeric without a decimal piece. Integers must be specified like 4L. Logicals are the boolean values TRUE and FALSE. Capital letters are important here; true and false are not valid. Characters are text values like "hello world".**
Up until now, you have been determining what data type a variable is just by looks. There is actually a better way to check this.
class(my_var)
In R, ls() is a function that lists the objects (e.g. variables, functions, datasets) in the current working environment. When you call the ls() function with no arguments, it will return a character vector of the names of all objects in the current workspace.
vector is a collection of objects of the same data type. Remember, you create a vector using the combine function, c(), and each element you add is separated by a comma.
For example, this is a vector of Apple's stock prices from December, 2016:
apple_stock <- c(109.49, 109.90, 109.11, 109.95, 111.03, 112.12)
And this is a character vector of bond credit ratings:
credit_rating <- c("AAA", "AA", "BBB", "BB", "B")
A logical vector
logic <- c( TRUE, FALSE, TRUE)
The hierarchy for coercion is:
logical < integer < numeric < character
Inside of plot(), you can change the type of your graph using type =. The default is "p" for points, but you can also change it to "l" for line.
Here is the 12 month return vector:
ret <- c(5, 2, 3, 7, 8, 3, 5, 9, 1, 4, 6, 3)
Select the first month: ret[1].
Select the first month by name: ret["Jan"].
Select the first three months: ret[1:3] or ret[c(1, 2, 3)]
Correlation is a measure of association between two things and is represented by a number from -1 to 1
A 1 represents perfect positive correlation, a -1 represents perfect negative correlation, and 0 correlation means that the stocks move independently of each other
Data frames are great because of their ability to hold a different type of data in each column.
To select the entire first row, leave the col empty: stocks[1, ]
To select the first two rows: stocks[1:2, ] or stocks[c(1,2), ]
To select an entire column, leave the row empty: stocks[, 1]
You can also select an entire column by name: stocks[, "apple"]
data.frame(company = c("A", "A", "B"), cash_flow = c(100, 200, 300), year = c(1, 3, 2)) company <- c("A", "A", "B") cash_flow <- c(100, 200, 300) year <- c(1, 3, 2)
data.frame(company, cash_flow, year)
head() - Returns the first few rows of a data frame. By default, 6. To change this, use head(cash, n = ___) tail() - Returns the last few rows of a data frame. By default, 6. To change this, use tail(cash, n = ___) str() - Check the structure of an object. This fantastic function will show you the data type of the object you pass in (here, data.frame), and will list each column variable along with its data type.
cash$cash_flow is the same as cash[,"cash_flow"]
You can delete a column by assigning it NULL subset(cash, company == "A")
Creating new columns in your data frame is as simple as assigning the new information to data_frame$new_column
present_value <- cash_flow * (1 + interest / 100) ^ -year
Note that in the example below, you are creating the plot from a factor and not a character vector. R will throw an error if you try and plot a character vector!
if list was not created...
my_list <- list(my_words = words, my_numbers = numbers)
or use this if list was already created...
my_list <- list(words, numbers) names(my_list) <- c("my_words", "my_numbers")
To access the elements in the list, use [ ]. This will always return another list.
my_list[1]
$my_words [1] "I <3 R"
my_list[c(1,2)]
$my_words [1] "I <3 R"
$my_numbers [1] 42 24 To pull out the data inside each element of your list, use [[ ]].
my_list[[1]]
[1] "I <3 R"
If your list is named, you can use the
Add weight: 20% Apple, 80% IBM
portfolio$weight <- c(apple = 0.2, ibm = 0.8)
to remove from list
my_list$dans_movie <- NULL
If your list is not named, you can also remove elements by position using my_list[1] <- NULL or my_list[[1]] <- NULL
Create a grouping to split on, and use split() to create a list of two data frames.
grouping <- cash$company split_cash <- split(cash, grouping)
split_cash
$A company cash_flow year 1 A 1000 1 2 A 4000 3 3 A 550 4
$B company cash_flow year 4 B 1500 1 5 B 1100 2 6 B 750 4 7 B 6000 5 To get your original data frame back, use unsplit(split_cash, grouping).
Take Notes
Add notes here about the concepts you've learned and code cells with code you want to keep.
Add your notes here
# Add your code snippets here
# Starting cash and returns
starting_cash <- 200
jan_ret <- 4
feb_ret <- 5
# Multipliers
jan_mult <- 1 + (jan_ret/ 100)
feb_mult <- 1 + (feb_ret/ 100)
# Total cash at the end of the two months
total_cash <- starting_cash * jan_mult * feb_mult
# Print total_cash
total_cash# Weights, returns, and company names
ret <- c(7, 9)
weight <- c(.2, .8)
companies <- c("Microsoft", "Sony")
# Assign company names to your vectors
names(ret) <- companies
names(weight) <- companies
# Multiply the returns and weights together
ret_X_weight <- ret * weight
# Print ret_X_weight
ret_X_weight
# Sum to get the total portfolio return
portf_ret <- sum(ret_X_weight)
# Print portf_ret
portf_ret
my_vector <- c(2, 3, 4, 5)
matrix(data = my_vector, nrow = 2, ncol = 2)
# Correlation of Apple and IBM
cor(apple, ibm)
# stock matrix
stocks <- cbind(apple, micr, ibm)
stocks
# cor() of all three
cor(stocks)
colnames(cash) <- c("company", "cash_flow", "year")
Select the first row: cash[1, ]
Select the first column: cash[ ,1]
Select the first column by name: cash[ ,"company"]
# Present value of $4000, in 3 years, at 5%
present_value_4k <- 4000 * 1.05 ^-3
# Present value of all cash flows
cash$present_value <- cash$cash_flow * (1 + 5/100)^ -cash$year
# Print out cash
cash
cash_A <- subset(cash, company == "A")
sum(cash_A$present_value)
# Total present value of cash
total_pv <- sum(cash$present_value)
# Company B information
cash_B <- subset(cash, company == "B")
# Total present value of cash_B
total_pv_B <- sum(cash_B$present_value)
total_pv_B
# credit_rating character vector
credit_rating <- c("BB", "AAA", "AA", "CCC", "AA", "AAA", "B", "BB")
# Create a factor from credit_rating
credit_factor <- factor(credit_rating)
# Print out your new factor
credit_factor
# Call str() on credit_rating
str(credit_rating)
# Call str() on credit_factor
str(credit_factor)
# Identify unique levels
levels(credit_factor)
# Rename the levels of credit_factor
levels(credit_factor) <- c("2A", "3A", "1B", "2B", "3C")
# Print credit_factor
credit_factor
# Summarize the character vector, credit_rating
summary(credit_rating)
# Summarize the factor, credit_factor
summary (credit_factor)
# Create 4 buckets for AAA_rank using cut()
AAA_factor <- cut(x = AAA_rank, breaks = c(0, 25, 50, 75, 100))
# Rename the levels
levels(AAA_factor) <- c("low", "medium", "high", "very_high")
# Print AAA_factor
AAA_factor
# Plot AAA_factor
plot(AAA_factor)
# Use unique() to find unique words
unique(credit_rating)
# Create an ordered factor
credit_factor_ordered <- factor(credit_rating, ordered = TRUE, levels = c("AAA", "AA", "BB", "B", "CCC"))
# Plot credit_factor_ordered
plot(credit_factor_ordered)
credit_factor[-1, drop = TRUE]
[1] AA A BBB AA BBB A
Levels: BBB < A < AA
# Remove the A bonds at positions 3 and 7. Don't drop the A level.
keep_level <- credit_factor [-c(3,7)]
credit_factor
# Plot keep_level
plot(keep_level)
# Remove the A bonds at positions 3 and 7. Drop the A level.
drop_level <- credit_factor[-c(3,7), drop = TRUE]
# Plot drop_level
plot(drop_level)
# Variables
credit_rating <- c("AAA", "A", "BB")
bond_owners <- c("Dan", "Tom", "Joe")
# Create the data frame of character vectors, bonds
bonds <- data.frame(credit_rating, bond_owners, stringsAsFactors = FALSE)
# Use str() on bonds
str(bonds)
# Create a factor column in bonds called credit_factor from credit_rating
bonds$credit_factor <- factor(bonds$credit_rating, ordered = TRUE, levels = c("AAA", "A", "BB"))
# Use str() on bonds again
str(bonds)
#for split-apply-combine
grouping <- cash$company
split_cash <- split(cash, grouping)
# We can access each list element's cash_flow column by:
split_cash$A$cash_flow
[1] 1000 4000 550
split_cash$A$cash_flow <- split_cash$A$cash_flow * 2
split_cash$B$cash_flow <- split_cash$B$cash_flow * 3
new_cash <- unsplit(split_cash, grouping)