Back to Templates

Analyze Multiple Time Series

This template provides a playbook to analyze multiple time series simultaneously. You will take an indepth look into your time series data by:

  1. Loading and visualizing your data
  2. Inspecting the distribution
  3. Analyzing subsets of your data
  4. Decomposing time series into seasonality, trend and noise
  5. Visualizing correlations with a clustermap
# Load packages
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.graphics import tsaplots
import statsmodels.api as sm
import seaborn as sns

1. Load and visualize your data

# Upload your data as CSV and load as a data frame
df = pd.read_csv(
    "data.csv",
    parse_dates=["datestamp"],  # Tell pandas which column(s) to parse as dates
    index_col="datestamp",  # Use a date column as your index
)
df.head()
AgricultureBusiness servicesConstructionDurable goods manufacturingEducation and HealthFinanceGovernmentInformationLeisure and hospitalityManufacturingMining and ExtractionNondurable goods manufacturingOtherSelf-employedTransportation and UtilitiesWholesale and Retail Trade
datestamp
2000-01-0110.35.79.73.22.32.72.13.47.53.63.94.44.92.34.35.0
2000-02-0111.55.210.62.92.22.82.02.97.53.45.54.24.12.54.05.2
2000-03-0110.45.48.72.82.52.61.53.67.43.63.75.14.32.03.55.1
2000-04-018.94.55.83.42.12.31.32.46.13.74.14.04.22.03.44.1
2000-05-015.14.75.03.42.72.21.93.56.23.45.33.64.51.93.44.3
# Plot settings
%config InlineBackend.figure_format='retina'
plt.rcParams["figure.figsize"] = (18, 10)
plt.style.use('ggplot')

# Plot all time series in the df DataFrame
ax = df.plot(
    colormap="Spectral",  # Set a colormap to avoid overlapping colors
    fontsize=10,  # Set fontsize
    linewidth=0.8, # Set width of lines
)

# Set labels and legend
ax.set_xlabel("Date", fontsize=12)  # X axis text
ax.set_ylabel("Unemployment Rate", fontsize=12) # Set font size
ax.set_title("Unemployment rate of U.S. workers by industry", fontsize=15)
ax.legend(
    loc="center left",  # Set location of legend within bounding box
    bbox_to_anchor=(1.0, 0.5),  # Set location of bounding box
)

# Annotate your plots with vertical lines
ax.axvline(
    "2001-07-01",  # Position of vertical line
    color="red",  # Color of line
    linestyle="--",  # Style of line
    linewidth=2, # Thickness of line
)
ax.axvline("2008-09-01", color="red", linestyle="--", linewidth=2)

# Show plot
plt.show()

2. Inspect the distribution

df.describe()
AgricultureBusiness servicesConstructionDurable goods manufacturingEducation and HealthFinanceGovernmentInformationLeisure and hospitalityManufacturingMining and ExtractionNondurable goods manufacturingOtherSelf-employedTransportation and UtilitiesWholesale and Retail Trade
count122.000000122.000000122.000000122.000000122.000000122.000000122.000000122.000000122.000000122.000000122.000000122.000000122.000000122.000000122.000000122.000000
mean9.8409846.9196729.4262306.0254103.4204923.5401642.5811485.4868858.3155745.9827875.0885255.9303285.0967213.0319674.9352465.766393
std3.9620671.8625344.5876192.8544750.8775381.2354050.6867502.0165821.6055702.4842212.9424281.9223301.3174571.1244291.7533401.463417
min2.4000004.1000004.4000002.8000001.8000002.1000001.3000002.4000005.9000003.1000000.3000003.1000002.9000001.7000002.3000003.600000
25%6.9000005.6000006.1000004.1250002.9000002.7000002.1000003.9000007.3000004.5000003.2000004.8250004.2000002.4000003.9000004.800000
50%9.6000006.4500008.1000005.1000003.2000003.3000002.4000005.1500008.0500005.3000004.3000005.5000004.9000002.7000004.4000005.400000
75%11.9500007.87500010.9750006.7750003.7000003.7000002.8750006.9000008.8000006.6000006.0500006.1000005.6000003.2000005.4000006.200000
max21.30000012.00000027.10000014.1000006.1000007.5000005.10000011.50000014.20000013.00000016.10000012.00000010.0000007.20000011.30000010.500000
# Generate a boxplot
ax = df.boxplot(fontsize=10, vert=False)  # Plots boxplot horizonally if false
ax.set_xlabel("Unemployment Percentage")
ax.set_title("Distribution of Unemployment by industry")
plt.show()

3. Analyze subsets of your data

a) Visualize (partial) autocorrelation

Autocorrelation refers to the degree of correlation of a variable between two successive time intervals. It measures how the lagged version of the value of a variable is related to the original version of it in a time series.

# Display the autocorrelation plot of your time series
fig = tsaplots.plot_acf(
    df["Agriculture"], lags=24  # Change column to inspect
)  # Set lag period

# Show plot
plt.show()
# Display the partial autocorrelation plot of your time series
fig = tsaplots.plot_pacf(
    df["Agriculture"], lags=24  # Change column to inspect
)  # Set lag period

# Show plot
plt.show()

b) Group data by different time periods

Uncover patterns by grouping your data by different time periods e.g. yearly, monthly, daily etc.

# Extract time period of interest
index_year = df.index.year  # Choose year, month, day etc.

# Compute mean for each time period
df_by_year = df.groupby(index_year).mean()  # Replace .mean() with aggregation function

# Plot the mean for each time period
ax = df_by_year.plot(fontsize=10, linewidth=1)

# Set axis labels and legend
ax.set_xlabel("Year", fontsize=12)
ax.set_ylabel("Mean unemployment rate", fontsize=12)
ax.axvline(
    2008,  # Position of vertical line
    color="red",  # Color of line
    linestyle="--",  # Style of line
    linewidth=2,
)  # Thickness of line

ax.legend(
    loc="center left", bbox_to_anchor=(1.0, 0.5)  # Placement of legend within bbox
)  # Location of boundary box (bbox)
plt.show()

4. Decompose time series into seasonality, trend and noise

Seasonality, trend and noise are essential to every time series. You can interpret them as such:

  • Trend shows you the increasing or decreasing value in the series.
  • Seasonality highlights the repeating short-term cycle in the series.
  • Noise is the random variation in the series.
# Run time series decomposition on each time series of the DataFrame
df_names = df.columns
df_decomp = {ts: sm.tsa.seasonal_decompose(df[ts]) for ts in df.columns}

# Capture the seasonal, trend and noise components for the decomposition of each time series
seasonal_dict = {ts: df_decomp[ts].seasonal for ts in df_names}
trend_dict = {ts: df_decomp[ts].trend for ts in df_names}
noise_dict = {ts: df_decomp[ts].resid for ts in df_names}

# Create a DataFrame from the dictionaries
seasonality_df = pd.DataFrame.from_dict(seasonal_dict)
trend_df = pd.DataFrame.from_dict(trend_dict)
noise_df = pd.DataFrame.from_dict(noise_dict)

# Remove the label for the index
seasonality_df.index.name = None
trend_df.index.name = None
noise_df.index.name = None
# Look at individual seasonality, trend or noise
noise_df["Agriculture"].plot()
# Change the dataframe and colum to explore
<AxesSubplot:>
# Create a faceted plot of the seasonality_df DataFrame
trend_df[["Agriculture", "Manufacturing"]].plot(
    subplots=True,  # Show multiple plots
    layout=(2, 1),  # Choose layout for showing plots
    sharey=False,  # Share the y axis
    legend=True,  # Show legend
    fontsize=10,  # Set fontsize
    linewidth=2,  # Set widht of line
)

plt.suptitle("Seasonality in Agriculture and Manufacturing", size=15)
plt.show()

5. Visualize correlations with a clustermap

A clustermap uses hierarchical clusters to order data by similarity. This reorganizes the data for and displays similar content next to one another.

# Get correlation matrix of your chosen dataframe
seasonality_corr = seasonality_df.corr(
    method="spearman"
)  # Choose method to calculate correlation

# Customize the clustermap of the correlation matrix
fig = sns.clustermap(
    seasonality_corr,  # Choose correlation matrix to visualize
    annot=True,  # Show annotations
    annot_kws={"size": 10},  # Customize annotations
    linewidths=0.4,
    figsize=(15, 10),
)

plt.setp(
    fig.ax_heatmap.xaxis.get_majorticklabels(),
    rotation=90,  # Change rotation of x-labels
)
plt.show()
Python

Analyze Multiple Time Series

Gain an indepth understanding of your time series data through multiple visualizations.

Use Template