Notes
Module 1
Time-series:
Time-ordered signal. Think stocks, air temperature, and more.
Correlation coefficient Measure of how much two series vary together. Indicative of relationship between two phenonema. Computes how closely the data are clustered along a line.
- Spurious correlation: Two trending series may erronously seem correlated if their levels are compared. To combat spurious correlation compare percent change (e.g., returns) [1].
Linear regression
Finds the slope 
Autocorrelation
The correlation of a single series with a time-lagged copy of itself
- Negative autocorrelation: With financial time-series it is called "mean-reverting".
- Positive autocorrelation: With financial time-series it is called "Trend-following" or "momentum".
Misc
Since stocks have historically had negative autocorrelation over horizons of about a week, one popular strategy is to buy stocks that have dropped over the last week and sell stocks that have gone up. Buy losers, sell winners. For other assets like commodities and currencies, historically they have had positive autocorrelation over horizons of several months. Buy winners, sell losers.
1 hidden cell
Module 2
Autocorrelation function (ACF)
Shows the entire autocorrelation function for different lags. Any significant non-zero autocorrelations implies that the series can be forecast using past values. Any lags within the the confidence interval (specified by 
White noise A white noise time series is simply a sequence of uncorrelated random variables that are identically distributed. Below are some properties.
- Mean that is constant over time
- Variance that is constant over time
- Autocorrelation that is zero at all lags
Random walk
Today's price is yesterdays price plus some nice, i.e., 
- Random walk with drift: 
- Regression test for random walk: 
- Test (if the slope coefficient 
Equivalently, we can instead regress on the difference in prices on the lagged price, and instead testing whether the slope coefficient 
This is called the Dickey-Fuller test. If you add more lagged prices to the right hand side, then it's called the Augmented Dickey-Fuller test [3].
Stationarity
In it's strictest definition, it means that the joint distribution of the data do not depend on time (time-invariant). A less restrictive version of stationarity, and one that is easier to test, is weak stationarity. The mean, variance and autocorrelations of the data are time invariant (i.e., for autocorrelation, 
Incidentally, taking the first difference of a random walk with drift would also result in a white noise process but with an offset equal to 
Moreover, log transforms removes exponential growth and seasonal differences (called seasonal adjustement) can be eliminated by differencing with a lag corresponding to the periodicity.
Misc
- Whereas stock returns are often modeled as white noise, stock prices closely follow a random walk. In other words, today's price is yesterday's price plus some random noise.
- Many time series, like stock prices, are random walks but tend to drift up over time.
- Random walks are a type of non-stationary series. For example, if stock prices are a random walk, then the uncertainty about prices tomorrow is much smaller than the uncertainty 10 years from now.
[2]
[3]
If the p-value is less than 5%, we can reject the null hypothesis that the series a random walk with 95% confidence. 
In this case (Adj. stock price for Amazon), the p-value is much higher than 0.5, it's ~1.0. Therefore, we can't reject the null-hypothesis that Amazon is a random walk. 
Interestingly, if we perform the same test on Amazon returns (precent-change) we reject the null-hypothesis, in other words, the stock returns are not a random walk process: 
Module 3
AR
Today's value equals a mean 
Since there's only a lag of one in the function, this is called: AR model of order 1, or simply an AR(1). If the AR parameter 
In order for the process to be stable and stationary, 
[4] has 4 simulated time-series with different AR parameters. When 
Higher order AR models
- AR(1): 
- AR(2): 
- AR(3): 
- ...
MA
Today's value equals a mean 
Since its only one lag it's a MA model or order one, or simply an MA(1) model. If the MA parameter 
MA models are stationary for all values of 
Interpreation of MA(1) model
- Negative - Positive - Note: One-period (lag-1) autocorrelation is: 
Higher order MA models
- MA(1): 
- MA(1): 
- MA(1): 
- ...
Misc
- Higher frequency stock returns are a nice example of of a MA(1) process.
- Stocks trade at discrete one-cent increments rather than continuous prices, you can see that the stock bounces back and forth over a one-cent range for long periods of time. It's called the bid/ask bounce. The bid/ask bounce induces a significant negative one lag autocorrelation, but no autocorrelation beyond lag-1 [5].
from statsmodels.tsa.arima_process import ArmaProcess
fig, axes = plt.subplots(nrows=4, ncols=2, figsize=(10, 9))
THETA = [0.9, -0.9, 0.5, -0.5]
for ax, theta in zip(axes[:2].ravel(), THETA):
    # no need to reverse sign like in AR
    ma = [1, theta]
    ar_object = ArmaProcess([1], ma)
    simulated_data = ar_object.generate_sample(nsample=500)
    theta = ma[1]
    ax.plot(simulated_data)
    ax.set_title(f"θ = {theta}")
for ax, theta in zip(axes[2:].ravel(), THETA):
    ma = [1, theta]
    ar_object = ArmaProcess([1], ma)
    simulated_data = ar_object.generate_sample(nsample=500)
    theta = ma[1]
    plot_acf(simulated_data, lags=20, alpha=0.05, title=f"θ = {theta}", ax=ax)
    
fig.suptitle("[4] Comparison of MA(1) series and autocorrelation function")
plt.show()Module 4 - Estimating and forecating an AR model
Identifying the order of an AR-model
- Partial autocorrelation function The partial autocorrelation function measures the incremental benefit of adding another lag. Represents how significant adding one more lag when you already have n-lags [7].
- Information critera The more parameters in a model, the better the model will fit the data. But this can lead to overfitting the data. The information criterion adjuts the goodness-of-fit of a model by imposing a penalty based on the number of parameters used (think L1/L2-regularization).
- Two popular adjusted goodness-of-fit measures: [8]
- AIC (Aikake information criterion)
- BIC (Bayesian information criterion) In practice, the best way to use BIC, is to fit several models, each with different parameters, and choose the one with the lowest information criterion [8][9].