Time series analysis aims to understand the underlying structures and functions of the temporal data, identify trends, seasonal variations, and cyclical patterns, and to forecast future values based on historical patterns. Temporal data means observations are collected at successive points in time, often at regular intervals such as hourly, daily, monthly,quarterly, annually. This type of data is prevalent in various fields such as finance, economics, environmental science, and engineering.
In this project, we will focus on one version of time series data: stock price prediction. We will learn how to prepare and visulize the stock price, analyze the patterns, select the best model and make next-day forecast.
First, we will go through the following steps to acquire the stock price data:
- Use the
yfinancepackage to download market data from theYahoo! Finance API - Define the date range that we want to use ('2020-01-01'-'2024-04-30')
- Load GameStop GME (NYSE) historical data (https://pypi.org/project/yfinance/)
Then, we will analyze data and create the optimal statistical model for the most accurate day ahead forecast. In meantime, we will compare statsmodels libraries to pmdarima to show the differences.
Step 0: β³ Import Libraries
# Basic operations
import pandas as pd
from pandas import DataFrame
import numpy as np
# Time decomposition
from datetime import datetime
#Data visulizations
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import matplotlib.pyplot as plt
import seaborn as sns
# Time series functions-statsmodels
from statsmodels.tsa.stattools import acf,pacf, adfuller
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf,plot_pacf
import statsmodels.api as sm
from pylab import rcParams
# Time series functions-pmdarima
import pmdarima as pm
from pmdarima import model_selection
from pmdarima import auto_arima
# Ignore harmless warnings
import warnings
warnings.filterwarnings("ignore")
We are introducing the template definition that we will be using going forward.
# Set a template for plotly
# Set template colors
dc_colors = ["#2B3A64", "#96aae3", "#C3681D", "#EFBD95", "#E73F74", "#80BA5A", "#E68310", "#008695", "#CF1C90", "#f97b72", "#4b4b8f", "#A5AA99"]
# Define template
pio.templates["dc"] = go.layout.Template(
layout=dict(
font={"family": "Poppins, Sans-serif", "color": "#505050"},
title={"font": {"family": "Poppins, Sans-serif", "color": "black"}, "yanchor": "top", "y": 0.92, "xanchor": "left", "x": 0.025},
plot_bgcolor="white",
paper_bgcolor="white",
hoverlabel=dict(bgcolor="white"),
margin=dict(l=100, r=50, t=75, b=70),
colorway=dc_colors,
xaxis=dict(showgrid=False),
yaxis=dict(showgrid=True,
gridwidth=0.1,
gridcolor='lightgrey',
showline=True,
nticks=10,
linewidth=1,
linecolor='black',
rangemode="tozero"))) Step 1: π Get to know the dataset
1.1. Loading the Data
# Import yfinance
import yfinance as yf
# Set the date range
start = '2020-01-01'
stop = '2024-04-30'
# Set the ticker we want to use (GameStop)
ticker = "GME"
# Get the data for the ticker GME
gme = yf.download(ticker, start, stop)
# Transform the index
gme["Date"] = gme.index
gme = gme[["Date", "Open", "High",
"Low", "Close", "Adj Close", "Volume"]]
gme.reset_index(drop=True, inplace=True)
# Ensure 'Date' is the index for the GME dataframe
gme.set_index('Date', inplace=True)
# Preview DataFrame
gme.head(5)
print(gme.dtypes)
gme.head(5)1.2. Inspecting the distributions
Let's get a sense of the data over the period we selected.
# Get a numeric summary of the data
gme.describe().round(2)# Checking any missing data-None!
gme.isnull().any()β
β