Skip to content

Notes

Module 1

Time-series:

Time-ordered signal. Think stocks, air temperature, and more.

Correlation coefficient Measure of how much two series vary together. Indicative of relationship between two phenonema. Computes how closely the data are clustered along a line.

  • Spurious correlation: Two trending series may erronously seem correlated if their levels are compared. To combat spurious correlation compare percent change (e.g., returns) [1].

Linear regression

Finds the slope , and the intercept of a line that is the best fit between an independent variable and a dependent variable .

: Measures how well the linear regression line fits the data. Tells us how much of the variability of the dependant variable is explained by the independent variable .

Autocorrelation

The correlation of a single series with a time-lagged copy of itself

  • Negative autocorrelation: With financial time-series it is called "mean-reverting".
  • Positive autocorrelation: With financial time-series it is called "Trend-following" or "momentum".

Misc

Since stocks have historically had negative autocorrelation over horizons of about a week, one popular strategy is to buy stocks that have dropped over the last week and sell stocks that have gone up. Buy losers, sell winners. For other assets like commodities and currencies, historically they have had positive autocorrelation over horizons of several months. Buy winners, sell losers.


1 hidden cell
Hidden code

Module 2

Autocorrelation function (ACF)

Shows the entire autocorrelation function for different lags. Any significant non-zero autocorrelations implies that the series can be forecast using past values. Any lags within the the confidence interval (specified by ) are deemed statistically insigificant (statistically indifferent from 0), i.e., contain useful information for forecasting [2]. For example, an value of 0.05 means that if the true autocorrelation at that lag is zero, there is only a 5% chance that the sample autocorrelation will fall outside that window. In other words, we are measuring how uncertain we are of the sample. Under some simplifying assumptions, we can compute the confidence intervall as

White noise A white noise time series is simply a sequence of uncorrelated random variables that are identically distributed. Below are some properties.

  • Mean that is constant over time
  • Variance that is constant over time
  • Autocorrelation that is zero at all lags

Random walk

Today's price is yesterdays price plus some nice, i.e., . The change of price of a random walk is just white noice: . The bottom line is that if stock prices follow a random walk, then stock returns are white noise. You can't forecast a random walk. Best guess for tomorrow's price is simply today's price. Random Walk with drift, prices on average drift by every period: . Incidentally, the change of price is still white noise but with a mean of : . If we instead think of stock prices as a random walk with drift, then the returns are still white noise, but with an average return of instead of zero. To test whether a series like stock prices follows a random walk, you can regress current prices on lagged prices:

  • Random walk with drift:
  • Regression test for random walk:
  • Test (if the slope coefficient is not statistically different from one, then we can't reject the null hypothesis that the series is a random walk)

Equivalently, we can instead regress on the difference in prices on the lagged price, and instead testing whether the slope coefficient is 1, we test whether it is zero:

This is called the Dickey-Fuller test. If you add more lagged prices to the right hand side, then it's called the Augmented Dickey-Fuller test [3].

Stationarity

In it's strictest definition, it means that the joint distribution of the data do not depend on time (time-invariant). A less restrictive version of stationarity, and one that is easier to test, is weak stationarity. The mean, variance and autocorrelations of the data are time invariant (i.e., for autocorrelation, is only a function of and not ). If a process is not stationary, then it becomes difficult to model (create simplified system for prediction). Modeling involves estimating a set of parameters, and if a process is not stationary, then the parameters are different at each point of time, then there are to many parameters to estimate. You might end up having more parameters than data. - if parameters vary over time, too many parameters to estimate. - can only estimate parsimonious model with a few parameters. If a process is a random walk, by taking the first differences the obtained series would be white noise, which is stationary.

Incidentally, taking the first difference of a random walk with drift would also result in a white noise process but with an offset equal to .

Moreover, log transforms removes exponential growth and seasonal differences (called seasonal adjustement) can be eliminated by differencing with a lag corresponding to the periodicity.

Misc

  • Whereas stock returns are often modeled as white noise, stock prices closely follow a random walk. In other words, today's price is yesterday's price plus some random noise.
  • Many time series, like stock prices, are random walks but tend to drift up over time.
  • Random walks are a type of non-stationary series. For example, if stock prices are a random walk, then the uncertainty about prices tomorrow is much smaller than the uncertainty 10 years from now.
[2]
Hidden code
[3]
Hidden code

If the p-value is less than 5%, we can reject the null hypothesis that the series a random walk with 95% confidence.

In this case (Adj. stock price for Amazon), the p-value is much higher than 0.5, it's ~1.0. Therefore, we can't reject the null-hypothesis that Amazon is a random walk.

Interestingly, if we perform the same test on Amazon returns (precent-change) we reject the null-hypothesis, in other words, the stock returns are not a random walk process:

Module 3

AR

Today's value equals a mean plus a fraction of yesterday's price, plus noise.

Since there's only a lag of one in the function, this is called: AR model of order 1, or simply an AR(1). If the AR parameter is equal to one, then the process is a random walk with drift, . Conversally, if is equal to zero, then the process is white noise .

In order for the process to be stable and stationary, has to be in the range of -1, to +1 (). An example, suppose models a stock return, if is negative then the stock is mean reverting. Consequently, a positive stock return last period, at time , implies that this period's return is more likely to be negative. On the other hand, if is positive, then a positive return last period week, implies that this period's return is expected to be positive (momentum/trend-following).

[4] has 4 simulated time-series with different AR parameters. When the series looks close to a random walk. Conversely, when , the process looks more erratic. A large positive value is usually followed by a negative positive value. The bottom two {} are similair, but are less exaggerated and closer to white noise. The autocorrelation decays exponentially at a rate of . Therefore if is 0.9, the lag-1 autocorrelation is 0.9, the lag-2 autocorrelation is , the lag-3 autocorrelation is , and so on (small AR parameter will have a steeper decay). When is negative, the autocorrelation fucntion still decays exponentially, but the signs of the autocorrelation reverses at each lag.

Higher order AR models

  • AR(1):
  • AR(2):
  • AR(3):
  • ...

MA

Today's value equals a mean and some noise , plus a fraction of yesterday's noise

Since its only one lag it's a MA model or order one, or simply an MA(1) model. If the MA parameter is zero than it's simply white noise, is considered a DC signal, only displacing the signal vertically.

MA models are stationary for all values of , unlike AR models of which has to belong in [-1, 1]

Interpreation of MA(1) model
  • Negative : One-period mean reversion
  • Positive : One-period momentum
  • Note: One-period (lag-1) autocorrelation is:

Higher order MA models

  • MA(1):
  • MA(1):
  • MA(1):
  • ...

Misc

  • Higher frequency stock returns are a nice example of of a MA(1) process.
  • Stocks trade at discrete one-cent increments rather than continuous prices, you can see that the stock bounces back and forth over a one-cent range for long periods of time. It's called the bid/ask bounce. The bid/ask bounce induces a significant negative one lag autocorrelation, but no autocorrelation beyond lag-1 [5].
Hidden code
from statsmodels.tsa.arima_process import ArmaProcess

fig, axes = plt.subplots(nrows=4, ncols=2, figsize=(10, 9))

THETA = [0.9, -0.9, 0.5, -0.5]

for ax, theta in zip(axes[:2].ravel(), THETA):
    # no need to reverse sign like in AR
    ma = [1, theta]
    ar_object = ArmaProcess([1], ma)
    simulated_data = ar_object.generate_sample(nsample=500)
    theta = ma[1]
    ax.plot(simulated_data)
    ax.set_title(f"θ = {theta}")

for ax, theta in zip(axes[2:].ravel(), THETA):
    ma = [1, theta]
    ar_object = ArmaProcess([1], ma)
    simulated_data = ar_object.generate_sample(nsample=500)
    theta = ma[1]
    plot_acf(simulated_data, lags=20, alpha=0.05, title=f"θ = {theta}", ax=ax)
    
fig.suptitle("[4] Comparison of MA(1) series and autocorrelation function")
plt.show()

Module 4 - Estimating and forecating an AR model

Identifying the order of an AR-model

  • Partial autocorrelation function The partial autocorrelation function measures the incremental benefit of adding another lag. Represents how significant adding one more lag when you already have n-lags [7].
  • Information critera The more parameters in a model, the better the model will fit the data. But this can lead to overfitting the data. The information criterion adjuts the goodness-of-fit of a model by imposing a penalty based on the number of parameters used (think L1/L2-regularization).
  • Two popular adjusted goodness-of-fit measures: [8]
    • AIC (Aikake information criterion)
    • BIC (Bayesian information criterion) In practice, the best way to use BIC, is to fit several models, each with different parameters, and choose the one with the lowest information criterion [8][9].
Hidden code