Skip to content
Analyze Multiple Time Series
This template provides a playbook to analyze multiple time series simultaneously. You will take an indepth look into your time series data by:
- Loading and visualizing your data
- Inspecting the distribution
- Analyzing subsets of your data
- Decomposing time series into seasonality, trend and noise
- Visualizing correlations with a clustermap
# Load packages
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.graphics import tsaplots
import statsmodels.api as sm
import seaborn as sns
1. Load and visualize your data
# Upload your data as CSV and load as a data frame
df = pd.read_csv(
"data.csv",
parse_dates=["datestamp"], # Tell pandas which column(s) to parse as dates
index_col="datestamp", # Use a date column as your index
)
df.head()
# Plot settings
%config InlineBackend.figure_format='retina'
plt.rcParams["figure.figsize"] = (18, 10)
plt.style.use('ggplot')
# Plot all time series in the df DataFrame
ax = df.plot(
colormap="Spectral", # Set a colormap to avoid overlapping colors
fontsize=10, # Set fontsize
linewidth=0.8, # Set width of lines
)
# Set labels and legend
ax.set_xlabel("Date", fontsize=12) # X axis text
ax.set_ylabel("Unemployment Rate", fontsize=12) # Set font size
ax.set_title("Unemployment rate of U.S. workers by industry", fontsize=15)
ax.legend(
loc="center left", # Set location of legend within bounding box
bbox_to_anchor=(1.0, 0.5), # Set location of bounding box
)
# Annotate your plots with vertical lines
ax.axvline(
"2001-07-01", # Position of vertical line
color="red", # Color of line
linestyle="--", # Style of line
linewidth=2, # Thickness of line
)
ax.axvline("2008-09-01", color="red", linestyle="--", linewidth=2)
# Show plot
plt.show()
2. Inspect the distribution
df.describe()
# Generate a boxplot
ax = df.boxplot(fontsize=10, vert=False) # Plots boxplot horizonally if false
ax.set_xlabel("Unemployment Percentage")
ax.set_title("Distribution of Unemployment by industry")
plt.show()
3. Analyze subsets of your data
a) Visualize (partial) autocorrelation
Autocorrelation refers to the degree of correlation of a variable between two successive time intervals. It measures how the lagged version of the value of a variable is related to the original version of it in a time series.
# Display the autocorrelation plot of your time series
fig = tsaplots.plot_acf(
df["Agriculture"], lags=24 # Change column to inspect
) # Set lag period
# Show plot
plt.show()
# Display the partial autocorrelation plot of your time series
fig = tsaplots.plot_pacf(
df["Agriculture"], lags=24 # Change column to inspect
) # Set lag period
# Show plot
plt.show()
b) Group data by different time periods
Uncover patterns by grouping your data by different time periods e.g. yearly, monthly, daily etc.
# Extract time period of interest
index_year = df.index.year # Choose year, month, day etc.
# Compute mean for each time period
df_by_year = df.groupby(index_year).mean() # Replace .mean() with aggregation function
# Plot the mean for each time period
ax = df_by_year.plot(fontsize=10, linewidth=1)
# Set axis labels and legend
ax.set_xlabel("Year", fontsize=12)
ax.set_ylabel("Mean unemployment rate", fontsize=12)
ax.axvline(
2008, # Position of vertical line
color="red", # Color of line
linestyle="--", # Style of line
linewidth=2,
) # Thickness of line
ax.legend(
loc="center left", bbox_to_anchor=(1.0, 0.5) # Placement of legend within bbox
) # Location of boundary box (bbox)
plt.show()
4. Decompose time series into seasonality, trend and noise
Seasonality, trend and noise are essential to every time series. You can interpret them as such:
- Trend shows you the increasing or decreasing value in the series.
- Seasonality highlights the repeating short-term cycle in the series.
- Noise is the random variation in the series.