Autocorrelation in R

Practice autocorrelation in R by using course material from DataCamp's Introduction to Time Series Analysis course.

Oct 18, 2018 · 6 min read

If you want to take our Introduction to Time Series Analysis in R course, here is the link.

Calculating Autocorrelations

Autocorrelations or lagged correlations are used to assess whether a time series is dependent on its past. For a time series x of length n we consider the n-1 pairs of observations one time unit apart. The first such pair is (x[2],x[1]), and the next is (x[3],x[2]). Each such pair is of the form (x[t],x[t-1]) where t is the observation index, which we vary from 2 to n in this case. The lag-1 autocorrelation of x can be estimated as the sample correlation of these (x[t], x[t-1]) pairs.

In general, we can manually create these pairs of observations. First, create two vectors, x_t0 and x_t1, each with length n-1, such that the rows correspond to (x[t], x[t-1]) pairs. Then apply the cor() function to estimate the lag-1 autocorrelation.

Luckily, the acf() command provides a shortcut. Applying acf(..., lag.max = 1, plot = FALSE) to a series x automatically calculates the lag-1 autocorrelation.

Finally, note that the two estimates differ slightly as they use slightly different scalings in their calculation of sample covariance, 1/(n-1) versus 1/n. Although the latter would provide a biased estimate, it is preferred in time series analysis, and the resulting autocorrelation estimates only differ by a factor of (n-1)/n.

In this exercise, you'll practice both the manual and automatic calculation of a lag-1 autocorrelation. The time series x and its length n (150) have already been loaded. The series is shown in the plot on the right.

Instructions

Create two vectors, x_t0 and x_t1, each with length n-1 such that the rows correspond to the (x[t], x[t-1]) pairs.
Confirm that x_t0 and x_t1 are (x[t], x[t-1]) pairs using the pre-written code.
Use plot() to view the scatterplot of x_t0 and x_t1.
Use cor() to view the correlation between x_t0 and x_t1.
Use acf() with x to automatically calculate the lag-1 autocorrelation. Set the lag.max argument to 1 to produce a single lag period and set the plot argument to FALSE.
Confirm that the difference factor is (n-1)/n using the pre-written code.

eyJsYW5ndWFnZSI6InIgIiwicHJlX2V4ZXJjaXNlX2NvZGUiOiJcbiMgcGVj XG5zZXQuc2VlZCg5ODc2KVxueCA8LSBhcmltYS5zaW0obGlzdChhcj0wLjc1 KSxuPTE1MClcbm4gPC0gbGVuZ3RoKHgpIDsgblxudHMucGxvdCh4KSIsInNh bXBsZSI6IlxuIyBEZWZpbmUgeF90MCBhcyB4Wy0xXVxueF90MCA8LSBcblxu IyBEZWZpbmUgeF90MSBhcyB4Wy1uXVxueF90MSA8LSBcblxuIyBDb25maXJt IHRoYXQgeF90MCBhbmQgeF90MSBhcmUgKHhbdF0sIHhbdC0xXSkgcGFpcnMg IFxuaGVhZChjYmluZCh4X3QwLCB4X3QxKSlcbiAgXG4jIFBsb3QgeF90MCBh bmQgeF90MVxucGxvdChfX18sIF9fXylcblxuIyBWaWV3IHRoZSBjb3JyZWxh dGlvbiBiZXR3ZWVuIHhfdDAgYW5kIHhfdDFcbmNvcihfX18sIF9fXylcblxu IyBVc2UgYWNmIHdpdGggeFxuYWNmKF9fXywgbGFnLm1heCA9IF9fXywgcGxv dCA9IF9fXylcblxuIyBDb25maXJtIHRoYXQgZGlmZmVyZW5jZSBmYWN0b3Ig aXMgKG4tMSkvblxuY29yKHhfdDEsIHhfdDApICogKG4tMSkvbiIsInNvbHV0 aW9uIjoiXG4jIERlZmluZSB4X3QwIGFzIHhbLTFdXG54X3QwIDwtIHhbLTFd IFxuXG4jIERlZmluZSB4X3QxIGFzIHhbLW5dXG54X3QxIDwtIHhbLW5dXG5c biMgQ29uZmlybSB0aGF0IHhfdDAgYW5kIHhfdDEgYXJlICh4W3RdLCB4W3Qt MV0pIHBhaXJzICBcbmhlYWQoY2JpbmQoeF90MCwgeF90MSkpXG4gIFxuIyBQ bG90IHhfdDAgYW5kIHhfdDFcbnBsb3QoeF90MCwgeF90MSlcblxuIyBWaWV3 IHRoZSBjb3JyZWxhdGlvbiBiZXR3ZWVuIHhfdDAgYW5kIHhfdDFcbmNvcih4 X3QwLCB4X3QxKVxuXG4jIFVzZSBhY2Ygd2l0aCB4XG5hY2YoeCwgbGFnLm1h eCA9IDEsIHBsb3QgPSBGQUxTRSlcblxuIyBDb25maXJtIHRoYXQgZGlmZmVy ZW5jZSBmYWN0b3IgaXMgKG4tMSkvbiBcbmNvcih4X3QxLCB4X3QwKSAqIChu LTEpL24iLCJzY3QiOiJcbnRlc3Rfb2JqZWN0KFwieF90MFwiKVxudGVzdF9v YmplY3QoXCJ4X3QxXCIpXG50ZXN0X2Z1bmN0aW9uKFwiaGVhZFwiKVxudGVz dF9mdW5jdGlvbihcInBsb3RcIiwgYXJncyA9IGMoXCJ4XCIsIFwieVwiKSlc bnRlc3RfZnVuY3Rpb24oXCJjb3JcIiwgYXJncyA9IGMoXCJ4XCIsIFwieVwi KSwgaW5kZXggPSAxKVxudGVzdF9mdW5jdGlvbihcImFjZlwiLCBhcmdzID0g YyhcInhcIiwgXCJsYWcubWF4XCIsIFwicGxvdFwiKSlcbnRlc3RfZnVuY3Rp b24oXCJjb3JcIiwgYXJncyA9IGMoXCJ4XCIsIFwieVwiKSwgaW5kZXggPSAy KVxudGVzdF9lcnJvcigpXG5cbnN1Y2Nlc3NfbXNnKFwiRXhjZWxsZW50IHdv cmshIEFzIHlvdSBjYW4gc2VlLCB0aGUgYGFjZigpYCBjb21tYW5kIGlzIGEg aGVscGZ1bCBzaG9ydGN1dCBmb3IgY2FsY3VsYXRpbmcgYXV0b2NvcnJlbGF0 aW9uLiBJbiB0aGUgbmV4dCBmZXcgZXhlcmNpc2VzLCB5b3UnbGwgZXhwbG9y ZSBhZGRpdGlvbmFsIGZlYXR1cmVzIG9mIHRoaXMgY29tbWFuZC5cIikiLCJo aW50IjoiXG5UaGUgYGBhY2YoKWBgIGNvbW1hbmQgaW4gdGhpcyBleGVyY2lz ZSBzaG91bGQgdGFrZSBhcmd1bWVudHMgZm9yIGBsb2cubWF4YCAoaW4gdGhp cyBjYXNlLCBgMWApIGFuZCBgcGxvdGAgKGluIHRoaXMgY2FzZSwgYEZBTFNF YCkuIElmIHlvdSdyZSBjb25mdXNlZCwgeW91IGNhbiBhY2Nlc3MgdGhlIGhl bHAgZG9jdW1lbnRhdGlvbiBmb3IgdGhpcyBjb21tYW5kIGJ5IHR5cGluZyBg YD9hY2ZgYCBpbnRvIHlvdXIgUiBjb25zb2xlLiJ9

If that makes sense keep going to the next exercise! If not, here is an overview video.

Overview Video on Autocorrelation

The Autocorrelation Function

Autocorrelations can be estimated at many lags to better assess how a time series relates to its past. We are typically most interested in how a series relates to its most recent past.

The acf(..., lag.max = ..., plot = FALSE) function will estimate all autocorrelations from 0, 1, 2,..., up to the value specified by the argument lag.max. In the previous exercise, you focused on the lag-1 autocorrelation by setting the lag.max argument to 1.

In this exercise, you'll explore some further applications of the acf() command. Once again, the time series x has been preloaded for you and is shown in the plot on the right.

Instructions

Use acf() to view the autocorrelations of series x from 0 to 10. Set the lag.max argument to 10 and keep the plot argument as FALSE.
Copy and paste the autocorrelation estimate (ACF) at lag-10.
Copy and paste the autocorrelation estimate (ACF) at lag-5.

eyJsYW5ndWFnZSI6InIiLCJwcmVfZXhlcmNpc2VfY29kZSI6IlxuIyBwZWNc bnNldC5zZWVkKDk4NzYpXG54ID0gYXJpbWEuc2ltKGxpc3QoYXI9MC43NSks bj0xNTApXG50cy5wbG90KHgpXG4iLCJzYW1wbGUiOiJcbiMgR2VuZXJhdGUg QUNGIGVzdGltYXRlcyBmb3IgeCB1cCB0byBsYWctMTBcbmFjZihfX18sIGxh Zy5tYXggPSBfX18sIHBsb3QgPSBGQUxTRSlcblxuIyBUeXBlIHRoZSBBQ0Yg ZXN0aW1hdGUgYXQgbGFnLTEwIFxuXG5cbiMgVHlwZSB0aGUgQUNGIGVzdGlt YXRlIGF0IGxhZy01Iiwic29sdXRpb24iOiJcbiMgR2VuZXJhdGUgQUNGIGVz dGltYXRlcyBmb3IgeCB1cCB0byBsYWctMTBcbmFjZih4LCBsYWcubWF4ID0g MTAsIHBsb3QgPSBGQUxTRSlcblxuIyBUeXBlIHRoZSBBQ0YgZXN0aW1hdGUg YXQgbGFnLTEwIFxuMC4xMDBcblxuIyBUeXBlIHRoZSBBQ0YgZXN0aW1hdGUg YXQgbGFnLTVcbjAuMTk4Iiwic2N0IjoiXG4jIHNjdCBjb2RlXG50ZXN0X2Z1 bmN0aW9uKFwiYWNmXCIsIGFyZ3M9YyhcInhcIiwgXCJsYWcubWF4XCIsIFwi cGxvdFwiKSlcbnRlc3Rfc3R1ZGVudF90eXBlZChcIjAuMTAwXCIpXG50ZXN0 X3N0dWRlbnRfdHlwZWQoXCIwLjE5OFwiKVxudGVzdF9lcnJvcigpXG5cbnN1 Y2Nlc3NfbXNnKFwiR3JlYXQgam9iISBTaW5jZSBhdXRvY29ycmVsYXRpb25z IG1heSB2YXJ5IGJ5IGxhZywgd2Ugb2Z0ZW4gY29uc2lkZXIgYXV0b2NvcnJl bGF0aW9ucyBhcyBhIGZ1bmN0aW9uIG9mIHRoZSB0aW1lIGxhZy4gVGFraW5n IHRoaXMgdmlldywgd2UgaGF2ZSBub3cgZXN0aW1hdGVkIHRoZSBhdXRvY29y cmVsYXRpb24gZnVuY3Rpb24gKEFDRikgb2YgYHhgIGZyb20gbGFncyAwIHRv IDEwLlwiKSIsImhpbnQiOiJcblRoZSBgYGFjZigpYGAgY29tbWFuZCBpbiB0 aGlzIGV4ZXJjaXNlIHNob3VsZCB0YWtlIGFyZ3VtZW50cyBmb3IgYGxvZy5t YXhgIChpbiB0aGlzIGNhc2UsIGAxMGApIGFuZCBgcGxvdGAgKGluIHRoaXMg Y2FzZSwgYEZBTFNFYCkuIEJlIHN1cmUgdG8gY2xvc2VseSBleGFtaW5lIHRo ZSBvdXRwdXQgb2YgdGhpcyBjb21tYW5kLiBJZiB5b3UncmUgY29uZnVzZWQs IHlvdSBjYW4gYWNjZXNzIHRoZSBoZWxwIGRvY3VtZW50YXRpb24gZm9yIHRo aXMgY29tbWFuZCBieSB0eXBpbmcgYGA/YWNmYGAgaW50byB5b3VyIFIgY29u c29sZS4ifQ==

Visualizing the Autocorrelation Function

Estimating the autocorrelation function (ACF) at many lags allows us to assess how a time series x relates to its past. The numeric estimates are important for detailed calculations, but it is also useful to visualize the ACF as a function of the lag.

In fact, the acf() command produces a figure by default. It also makes a default choice for lag.max, the maximum number of lags to be displayed.

Three time series x, y, and z have been loaded into your R environment and are plotted on the right. The time series x shows strong persistence, meaning the current value is closely relatively to those that proceed it. The time series y shows a periodic pattern with a cycle length of approximately four observations, meaning the current value is relatively close to the observation four before it. The time series z does not exhibit any clear pattern.

In this exercise, you'll plot an estimated autocorrelation function for each time series. In the plots produced by acf(), the lag for each autocorrelation estimate is denoted on the horizontal axis and each autocorrelation estimate is indicated by the height of the vertical bars. Recall that the ACF at lag-0 is always 1.

Finally, each ACF figure includes a pair of blue, horizontal, dashed lines representing lag-wise 95% confidence intervals centered at zero. These are used for determining the statistical significance of an individual autocorrelation estimate at a given lag versus a null value of zero, i.e., no autocorrelation at that lag.

Instructions

Use three calls of the function acf() to display the estimated ACFs of each of your three time series (x, y, and z). There is no need to specify additional arguments in your calls to acf().

eyJsYW5ndWFnZSI6InIgIiwicHJlX2V4ZXJjaXNlX2NvZGUiOiJcbiMgcGVj XG5zZXQuc2VlZCg0MzY2KVxucGFyKG1mY29sPWMoMywxKSlcbnggPSBhcmlt YS5zaW0obGlzdChhcj0wLjc1KSxuPTE1MClcbnRzLnBsb3QoeCwgbWFpbiA9 IFwiVEhSRUUgVElNRSBTRVJJRVNcIilcbnkgPSBhcmltYS5zaW0obGlzdChh cj1jKDAsMCwwLDAuODgpKSxuPTE1MClcbnRzLnBsb3QoeSlcbnogPSBhcmlt YS5zaW0obGlzdChvcmRlcj1jKDAsMCwwKSksbj0xNTApXG50cy5wbG90KHop Iiwic2FtcGxlIjoiXG4jIFZpZXcgdGhlIEFDRiBvZiB4XG5hY2YoX19fKVxu XG4jIFZpZXcgdGhlIEFDRiBvZiB5XG5cblxuIyBWaWV3IHRoZSBBQ0Ygb2Yg eiIsInNvbHV0aW9uIjoiXG4jIFZpZXcgdGhlIEFDRiBvZiB4XG5hY2YoeClc blxuIyBWaWV3IHRoZSBBQ0Ygb2YgeVxuYWNmKHkpXG5cbiMgVmlldyB0aGUg QUNGIG9mIHpcbmFjZih6KSIsInNjdCI6IlxudGVzdF9mdW5jdGlvbihcImFj ZlwiLCBpbmRleCA9IDEpXG50ZXN0X2Z1bmN0aW9uKFwiYWNmXCIsIGluZGV4 ID0gMilcbnRlc3RfZnVuY3Rpb24oXCJhY2ZcIiwgaW5kZXggPSAzKVxudGVz dF9lcnJvcigpXG5cbnN1Y2Nlc3NfbXNnKFwiR3JlYXQgam9iISBQbG90dGlu ZyB0aGUgZXN0aW1hdGVkIEFDRiBvZiBgeGAgc2hvd3MgbGFyZ2UgcG9zaXRp dmUgY29ycmVsYXRpb25zIGZvciBzZXZlcmFsIGxhZ3Mgd2hpY2ggcXVpY2ts eSBkZWNheSB0b3dhcmRzIHplcm8uIFBsb3R0aW5nIHRoZSBlc3RpbWF0ZWQg QUNGIG9mIGB5YCBzaG93cyBsYXJnZSBwb3NpdGl2ZSBjb3JyZWxhdGlvbnMg YXQgbGFncyB3aGljaCBhcmUgbXVsdGlwbGVzIG9mIGZvdXIsIGFsdGhvdWdo IHRoZXNlIGFsc28gZGVjYXkgdG93YXJkcyB6ZXJvIGFzIHRoZSBsYWcgbXVs dGlwbGUgaW5jcmVhc2VzLiBGaW5hbGx5LCB0aGUgZXN0aW1hdGVkIEFDRiBv ZiBgemAgaXMgbmVhciB6ZXJvIGF0IGFsbCBsYWdzLiBJdCBhcHBlYXJzIHRo ZSBzZXJpZXMgYHpgIGlzIG5vdCBsaW5lYXJseSByZWxhdGVkIHRvIGl0cyBw YXN0LCBhdCBsZWFzdCB0aHJvdWdoIGxhZyAyMC5cIikiLCJoaW50IjoiXG5X aGVyZWFzIHlvdSBwcmV2aW91c2x5IHNwZWNpZmllZCB0aGUgYGxhZy5tYXhg IGFuZCBgcGxvdGAgYXJndW1lbnRzIGluIHlvdXIgY2FsbHMgdG8gYGBhY2Yo KWBgLCBuZWl0aGVyIHRoZXNlIGFyZ3VtZW50cyBhcmUgc3RyaWN0bHkgbmVj ZXNzYXJ5IHdoZW4gdXNpbmcgdGhlIGBgYWNmKClgYCBjb21tYW5kLiBgYGFj ZigpYGAgd2lsbCBjaG9vc2UgYSBgbGFnLm1heGAgdmFsdWUgZm9yIHlvdSBi eSBkZWZhdWx0IGFuZCB3aWxsIHNldCB0aGUgYHBsb3RgIGFyZ3VtZW50IHRv IGBUUlVFYC4gSWYgeW91J3JlIGNvbmZ1c2VkLCB5b3UgY2FuIGFjY2VzcyB0 aGUgaGVscCBkb2N1bWVudGF0aW9uIGZvciB0aGlzIGNvbW1hbmQgYnkgdHlw aW5nIGBgP2FjZmBgIGludG8geW91ciBSIGNvbnNvbGUuIn0=

If you want to learn more from this course, here is the link.

Check out our Time Series Analysis using R: Tutorial.

Topics

Data Science