Skip to content

Timeseries with Stock Exchange Data

This dataset consists of stock exchange data since 1965 for several indexes. It contains the daily stock prices along with the volume traded each day.

Not sure where to begin? Scroll to the bottom to find challenges!

import pandas as pd
import numpy as np
import datetime
import matplotlib.pyplot as plt
import seaborn as sns
stock = pd.read_csv("stock_data.csv", index_col=None)
print(stock.shape)
stock.head(100)

Data Dictionary

ColumnExplanation
IndexTicker symbol for indexes
DateData of observation
OpenOpening price
HighHighest price during trading day
LowLowest price during trading day
CloseClose price
Adj CloseClose price adjusted for stock splits and dividends
VolumeNumber of shares traded during trading day
CloseUSDClose price in terms of USD

Source of dataset.

Exploratory data analysis

# date format
stock.info()
# convert Date
stock['Date'] = pd.to_datetime(stock['Date'])
# take a look at the indices
stock_pivot = stock[['Date', 'Index','Close']] \
            .pivot(index = 'Date', columns = 'Index', values = 'Close')
stock_pivot.isna().sum()
# visualize timeseries with missing values
stock_pivot.plot()
# starting data
start = stock.groupby('Index')['Date'].agg('min').sort_values()
sns.histplot(data=pd.to_datetime(start).dt.year, bins = 8)

it seems like data is clean but each index has a different starting date from 1965 to 2012

# Calculate the annual return
an_return = pd.DataFrame(stock_pivot.pct_change().resample('Y').sum()).sum()
an_volatility = pd.DataFrame(stock_pivot.pct_change().resample('Y').std()).mean()
ax = sns.barplot(data=an_return)
ax.set(xlabel='index', ylabel='annual return')
plt.xticks(rotation=30)
ax2 = plt.twinx()
sns.lineplot(data=an_volatility, ax=ax2, color = 'red')
plt.grid()
ax2.set_ylabel('annual volatility', color='red')

The best annual return on investment with less volatility is index IXIC. For more secure investment with the lowest volatility, indices GSPTSE and NYA are the best although return on investment is less than IXIC.

# visualize best indices with a 30 days rolling windows since the negining of the covid break through 
ixic = stock_pivot['IXIC'].dropna().resample('D').interpolate() \
        .pct_change().rolling(30).mean().dropna().mul(100)
gsptse = stock_pivot['GSPTSE'].dropna().resample('D').interpolate() \
        .pct_change().rolling(30).mean().dropna().mul(100)
nya = stock_pivot['NYA'].dropna().resample('D').interpolate() \
        .pct_change().rolling(30).mean().dropna().mul(100)
ax = pd.concat([ixic, gsptse, nya], axis=1).loc['2019-11':,:].plot()
ax.set(xlabel='date', ylabel='annual return')
plt.grid()
plt.title('Daily return with 30 days rolling windows since COVID')