Start Learning for Free

Join over 500,000 other Data Science learners and start one of our interactive tutorials today!

Topic r small

Algorithmic Trading in R Tutorial

February 9th, 2017 in R Programming

In this post, I will show how to use R to collect the stocks listed on loyal3, get historical data from Yahoo and then perform a simple algorithmic trading strategy. Along the way, you will learn some web scraping, a function hitting a finance API and an htmlwidget to make an interactive time series chart.

For this post, a trading algo is defined as a set of rules that trigger a buy or sell event rather than a predictive model or time series forecast. This is the simplest type of trading algo, but if you are interested in digging deeper into finance with R, I would encourage you to take DataCamp’s course in modelling a quantitative trading strategy in R.

Contents

Background

In 2015, I started investing a little at loyal3. Their service is unusual and a great place to start your investment journey. Rather than charge the investor for trades, loyal3 charges the companies to list on their platform. The premise is that people who like a company’s service would also buy the stock and in doing so become strong brand advocates. Making the platform more compelling is that you can buy fractional shares. So, you can get into that $800 amazon stock for only $10 and buy another $10 fraction each time you have a bit of extra cash at the end of the month. Sure there are friction costs since you have to trade in windows and your entire portfolio is limited to ~70 stocks but loyal3 represents a fun and low cost way to explore equity training. You can put real skin in the game for as little as $10!

To be clear, I have the typical retirement and investment accounts but I like loyal3’s clean interface on the app and the lack of fees. I end up checking my fun loyal3 portfolio more often than my mutual funds simply because it is easy and amusing to see the performance of the stocks I directly picked.

The stocks that are available at www.loyal3.com

Setting Up Your Workspace

To start, load the libraries into your environment. I almost always use rvest for web scraping these days. There are other packages that work including RSelenium, but I like how easy rvest can be executed.

The second package, pbapply, is optional because it simply adds a progress bar to the apply functions. Since you could be scraping hundreds of web pages a progress bar can be helpful to estimate the time.

Next, TTR is a package that I just started to explore. The library is used to construct “Technical Trading Rules”. Although you will learn a simple trading algo in this post, the TTR package can perform more sophisticated calculations and is worth learning.

The dygraphs library is a wrapper for a fast, open source JavaScript charting library. It is one of the htmlwidgets that makes R charting more dynamic and part of an html file instead of a static image. Lastly, the lubridate package is used for easy date manipulation.

library(rvest)
library(pbapply)
library(TTR)
library(dygraphs)
library(lubridate)

Data Collection

All the loyal3 stocks are all listed on a single page. Before you can look up individual daily stock prices to build your trading algorithm, you need to collect all available stocker tickers. The first thing to do is declare stock.list as a URL string. Next use read_html() so your R session will create an Internet session and collect all the html information on the page as an XML node set. The page CSS has an ID called “.company-name”. Use this as a parameter when calling html_nodes() to select only the XML data associated to this node. Lastly, use html_text() so the actual text values for the company names is collected.

stock.list<-'https://www.loyal3.com/stocks'
stocks<-read_html(stock.list)
stocks.names<-html_nodes(stocks,'.company-name')
stocks.names<-html_text(stocks.names)

To examine the stocks that are available on loyal3, you can print the stocks.names object to your console. This returns the company name as a text vector.

stocks.names 

In order to research the stock prices, you need to get the ticker symbol first. When you are on the loyal3 site, you can click on the company tile to load a page with a ticker symbol and other company information.

Using html_nodes() on stocks, you pull all nodes marked with an “a.” In HTML the <a> tag defines a hyperlink which is used to link form one page to another. Within the hyperlink tag, the “href” refers to the exact URL address. So html_attr() will extract the URL for ALL links on the page if you pass in “href”.

After doing some manual inspection, I found the 54th to 123rd links on the page represent the company pages I need in order to scrape the ticker information. The last line uses paste0() to concatenate the base URL string ’http://www.loyal3.com` to the specific company pages, like “/WALMART”. For example, http://www.loyal3.com/WALMART:

loyal.links<-html_nodes(stocks, "a")
loyal.links<-html_attr(loyal.links, "href")
stock.links<-paste0('http://www.loyal3.com',loyal.links[54:123])

On each of the company pages there is a description, a recent closing price and the ticker. All company pages are organized the same so the custom function get.ticker() can be used to extract the ticker symbol.

Within a company’s web page there is a table called “ticker-price”. The function will navigate to a company page, identify the appropriate table, extract the text with html_text(). Lastly, using sub() along with the regular expression ^([[:alpha:]]*).* and \\1 will retain all alphabetical characters. The result is that the any special characters, like $, and any numeric characters, like the closing price, are removed. As the function reads each of the 70 pages, it will only collect the stock ticker.

get.ticker<-function(url){
  x<-read_html(url)
  x<-html_node(x,'.ticker-price')
  x<-html_text(x)
  x<-sub("^([[:alpha:]]*).*", "\\1", x)
  return(x)
  } 

the loyal3 stock page for Alibaba, where you see the table containing the stock’s ticker, BABA, is below the bolded text

Armed with your custom function, use pblapply() to apply it to each of the stock.links which contain each company’s page. The resulting object, stock.tickers, is a list of individual stock tickers with each element corresponding to an individual company.

stock.tickers<-pblapply(stock.links,get.ticker)

One way to change a list of elements into a flat object is with do.call(). Here, you are applying rbind to row bind each list element into a single vector. Lastly, you create a data frame with the symbol and company name information.

stock.ticks<-do.call(rbind,stock.tickers)
stock.ticks<-data.frame(symbol=stock.ticks,name=stocks.names)

To be consistent in your analysis, you may want to limit the amount of historical information you gather on each stock. The Sys.Data() function will store a date object as year, month and then day. Using years with an integer is one way to subtract a specific amount of time from the start.date object.

start.date<-Sys.Date()
end.date<-Sys.Date()-years(3)

To get the Yahoo finance data, the date object has to be changed to simple character objects without a dash. Using the global substitution function gsub() on both start.date and end.date will change the class and simultaneously remove dashes. Within gsub(), pass in the character pattern to search for, then the replacement characters. In this case the replacing pattern is an empty character in between quotes. The last parameter is the object that gsub() will be applied to.

start.date<-gsub('-','', start.date)
end.date<-gsub('-','', end.date)

The TTR() function getYahooData() accepts a stock symbol, and a starting and ending date. The function returns a data frame that has time series information. Each row is a date and the columns contain information such as the “Open”, “High”, “Low” and “Closing” price for an equity. Since you are looking up multiple companies, you can use lapply() or pblapply(). Pass in the vector of company symbols, then the function, getYahooData(), and then the date information. The date objects are recycled parameters each time getYahooData() is applied to a stock symbol.

 stocks.ts<-pblapply(stock.ticks$symbol,getYahooData,end.date, start.date)

To make selecting the returned list, stocks.ts, easier to navigate you can add names to the list elements. Using names with the stocks.ts object declare the names as the original $symbol vector.

names(stocks.ts)<-stock.ticks$symbol

When working with large lists, I like to examine the resulting object to make sure the outcome is what I expected. Now that the elements have names, you can reference them directly. In this example, you are examining the first 6 rows for AMC Entertainment Holdings (AMC). Using head() on the list while referencing $AMC will return a portion of the time series for this stock:

head(stocks.ts$AMC)

Examining the Stock Data

When I listen to financial news commentators often refer to charts. Despite high frequency trading and active management performed by others, many small investors still refer to charts to gain insight. The time series object can be quickly displayed using plot. Pass in the list referring to the named element such as $AMC and then the column you want to display, here $Close.

plot(stocks.ts$AMZN$Close)

The preceding plot is static and not very interesting.

Let’s use a JavaScript library to make a chart you can explore. In this code snippet, you may observe the “%>%” or pipe operator. The pipe operator is a good way to write concise code. It forwards an object to the next function without forcing you to rewrite an object name like you did earlier in this post.

In this example, you create a dygraph referring to the Twitter stock, $TWTR, and then the column you want to plot, $Close. Within dygraph, main adds a title that is specified in between the quotes. Using the “%>%” this entire object is forwarded to the next function dyRangeSelector(). You can specify a default date range using c() with a start and end date string. The resulting HTML object is a dynamic time series for Twitter’s stock with a date slider at the bottom.

Remember, to change the equity displayed, change the ticker symbol in the stocks.ts list and then the graph title.

dygraph(stocks.ts$TWTR$Close, main = "TWTR Stock Price") %>%
  dyRangeSelector(dateWindow = c("2013-12-18", "2016-12-30"))

This is a basic dygraph for Twitter’s stock

A Simple Trading Strategy: Trend Following

High frequency traders and hedge funds use sophisticated models and rules based approaches to execute trades. If you want to learn more I suggest visiting www.quantopian.com for advanced approaches. For simpler approaches start with this page at www.Investopedia.com.

In the code below, you will visualize a simple momentum trading strategy. Basically, you would want to calculate the 200 day and 50 day moving averages for a stock price.On any given day that the 50 day moving average is above the 200 day moving average, you would buy or hold your position. On days where the 200 day average is more than the 50 day moving average, you would sell your shares. This strategy is called a trend following strategy. The positive or negative nature between the two temporal based averages represents the stock’s momentum.

The TTR package provides SMA() for calculating simple moving average. In this code snippet, you are examining the first 6 values for Twitter’s 200 and 50 day moving averages. SMA() works by passing in the time series data for a stock and a specific column like Close. This is a single vector of closing prices for the TWTR stock. The second parameter is an integer representing the number of observations for the moving average. Without using head() the SMA() function will return all values.

head(SMA(stocks.ts$TWTR$Close, 200))
head(SMA(stocks.ts$TWTR$Close, 50))  

Now that you have examined the moving average function in detail, you need to apply to each of the 70 stocks. stocks.ts is a list of 70 data frames containing individual stock data. The fourth column of each data frame contains the closing price that we want to use for the moving averages.

The custom function mov.avgs() accepts a single stock data frame to calculate the moving averages. The first line selects the closing prices because it indexes [,4] to create stock.close. Next, the function uses ifelse to check the number of rows in the data frame. Specifically if the nrow in the data frame is less than (2*260), then the function will create a data frame of moving averages with “NA”.

I chose this number because there is about 250 trading days a year so this will check that the time series is about 2 years or more in length. Loyal3 sometimes can get access to IPOs and if the stock is newly public there will not be enough data for a 200 day moving average. However, if the nrow value is greater than 2*260 then the function will create a data frame with the original data along with 200 and 50 day moving averages as new columns. Using colnames, I declare the column names. The last part of the function uses complete.cases to check for values in the 200 day moving average column. Any rows that do not have a value are dropped in the final result.

mov.avgs<-function(stock.df){
  stock.close<-stock.df[,4]
  ifelse((nrow(stock.df)<(2*260)),
         x<-data.frame(stock.df, 'NA', 'NA'),
         x<-data.frame(stock.df, SMA(stock.close, 200), SMA(stock.close, 50)))
  colnames(x)<-c(names(stock.df), 'sma_200','sma_50')
  x<-x[complete.cases(x$sma_200),]
  return(x)
  }

Armed with this mov.avgs() function you can use pblapply() to add the moving average calculations to each of the 70 data frames.

stocks.ts<-pblapply(stocks.ts, mov.avgs)

Use the code below to visualize a stock’s moving averages using a dygraph. Once again, this code is using the “%>%” operator to forward objects. The dygraph() function accepts the stocks.ts$FOX data frame. Specifically, the data frame is indexed by column name with c('sma_200','sma_50'). This object is passed to dySeries() in the next 2 lines. You can refer to a column by name so dySeries() each plot a line for the “sma_50” and “sma_200” values in lines 2 and 3. This object is forwarded again to the dyRangeSelector() to adjust the selector’s height. Lastly, I added some shading to define periods when you would have wanted to buy or hold the equity and a period when you should have sold your shares or stayed away depending on your position.

dygraph(stocks.ts$FOX[,c('sma_200','sma_50')],main = 'FOX Moving Averages') %>%
  dySeries('sma_50', label = 'sma 50') %>%
  dySeries('sma_200', label = 'sma 200') %>%
  dyRangeSelector(height = 30) %>%
  dyShading(from = '2016-4-28', to = '2016-7-27', color = '#CCEBD6') %>%
  dyShading(from = '2016-7-28', to = '2016-12-30', color = '#FFE6E6')

Here is the final result in an interactive time series.

The FOX moving averages with shaded regions for buying/holding versus selling

Conclusion

As a budding algorithmic trader, you do not need to plot all 70 shares. Instead, you would want to run the code every day and add a programmatic way to identify stocks that fit the rule based method, “buy if the 50 day moving average is above the 200 day moving average”. As you review the preceding chart, the green section is a time in which you would buy the FOX equity. The red section represents the time to sell your shares and not reenter.

Since the graph is interactive, you can use the slider to resize the visual. Based on this simple algo trading approach, now may be a good time to buy FOX! December 30, 2016 was a trading day where the 50 day moving average moved $0.01 higher than the 200 day moving average!

The zoomed section of the FOX equity

Of course, remember all investments can lose value. To learn more about finance and algo trading, check out DataCamp’s courses here.

Comments

shankarroshanambooj
Looks like we have just scratched the tip of the ice berg!! Good effort never-the-less! I would love to see another one with option trading for a particular stock across various strikes!
02/13/17 11:41 PM |