Moving Averages in pandas
Learn how you can capture trends and make sense out of time series data with the help of a moving or rolling average.
Introduction
A moving average, also called a rolling or running average, is used to analyze the time-series data by calculating averages of different subsets of the complete dataset. Since it involves taking the average of the dataset over time, it is also called a moving mean (MM) or rolling mean.
There are various ways in which the rolling average can be calculated, but one such way is to take a fixed subset from a complete series of numbers. The first moving average is calculated by averaging the first fixed subset of numbers, and then the subset is changed by moving forward to the next fixed subset (including the future value in the subgroup while excluding the previous number from the series).
The moving average is mostly used with time series data to capture the short-term fluctuations while focusing on longer trends.
A few examples of time series data can be stock prices, weather reports, air quality, gross domestic product, employment, etc.
In general, the moving average smoothens the data.
Moving average is a backbone to many algorithms, and one such algorithm is Autoregressive Integrated Moving Average Model (ARIMA), which uses moving averages to make time series data predictions.
There are various types of moving averages:
-
Simple Moving Average (SMA): Simple Moving Average (SMA) uses a sliding window to take the average over a set number of time periods. It is an equally weighted mean of the previous n data.
To understand SMA further, lets take an example, a sequence of n values:
then the equally weighted rolling average for n data points will be essentially the mean of the previous M data-points, where M is the size of the sliding window:
Similarly, for calculating succeeding rolling average values, a new value will be added into the sum, and the previous time period value will be dropped out, since you have the average of previous time periods so full summation each time is not required:
- Cumulative Moving Average (CMA): Unlike simple moving average which drops the oldest observation as the new one gets added, cumulative moving average considers all prior observations. CMA is not a very good technique for analyzing trends and smoothing out the data.
The reason being, it averages out all of the previous data up until the current data point, so an equally weighted average of the sequence of n values:
up to the current time is given by:
Similarly, to update cumulative average for every new value that comes can be calculated using the below formula:
- Exponential Moving Average (EMA): Unlike SMA and CMA, exponential moving average gives more weight to the recent prices and as a result of which, it can be a better model or better capture the movement of the trend in a faster way. EMA's reaction is directly proportional to the pattern of the data.
Since EMAs give a higher weight on recent data than on older data, they are more responsive to the latest price changes as compared to SMAs, which makes the results from EMAs more timely and hence EMA is more preferred over other techniques.
Enough of theory, right? Let's jump to the practical implementation of the moving average.
Implementing Moving Average on Time Series Data
Simple Moving Average (SMA)
First, let's create dummy time series data and try implementing SMA using just Python.
Assume that there is a demand for a product and it is observed for 12 months (1 Year), and you need to find moving averages for 3 and 4 months window periods.
Import module
# pip install -r requirements.txt
import pandas as pd
import numpy as np
product = {'month' : [1,2,3,4,5,6,7,8,9,10,11,12],'demand':[290,260,288,300,310,303,329,340,316,330,308,310]}
df = pd.DataFrame(product)
df.head()
Let's calculate SMA for a window size of 3, which means you will consider three values each time to calculate the moving average, and for every new value, the oldest value will be ignored.
To implement this, you will use pandas iloc
function, since the demand
column is what you need, you will fix the position of that in the iloc
function while the row will be a variable i
which you will keep iterating until you reach the end of the dataframe.
for i in range(0,df.shape[0]-2):
df.loc[df.index[i+2],'SMA_3'] = np.round(((df.iloc[i,1]+ df.iloc[i+1,1] +df.iloc[i+2,1])/3),1)
df.head()
For a sanity check, let's also use the pandas
in-built rolling
function and see if it matches with our custom python based simple moving average.
df['pandas_SMA_3'] = df.iloc[:,1].rolling(window=3).mean()
df.head()
Cool, so as you can see, the custom and pandas moving averages match exactly, which means your implementation of SMA was correct.
Let's also quickly calculate the simple moving average for a window_size
of 4.
for i in range(0,df.shape[0]-3):
df.loc[df.index[i+3],'SMA_4'] = np.round(((df.iloc[i,1]+ df.iloc[i+1,1] +df.iloc[i+2,1]+df.iloc[i+3,1])/4),1)