Skip to content
Stock Exchange Data
Timeseries with Stock Exchange Data
This dataset consists of stock exchange data since 1965 for several indexes. It contains the daily stock prices along with the volume traded each day.
Not sure where to begin? Scroll to the bottom to find challenges!
import pandas as pd
import numpy as np
import datetime
import matplotlib.pyplot as plt
import seaborn as sns
stock = pd.read_csv("stock_data.csv", index_col=None)
print(stock.shape)
stock.head(100)Data Dictionary
| Column | Explanation |
|---|---|
| Index | Ticker symbol for indexes |
| Date | Data of observation |
| Open | Opening price |
| High | Highest price during trading day |
| Low | Lowest price during trading day |
| Close | Close price |
| Adj Close | Close price adjusted for stock splits and dividends |
| Volume | Number of shares traded during trading day |
| CloseUSD | Close price in terms of USD |
Source of dataset.
Exploratory data analysis
# date format
stock.info()# convert Date
stock['Date'] = pd.to_datetime(stock['Date'])# take a look at the indices
stock_pivot = stock[['Date', 'Index','Close']] \
.pivot(index = 'Date', columns = 'Index', values = 'Close')
stock_pivot.isna().sum()# visualize timeseries with missing values
stock_pivot.plot()# starting data
start = stock.groupby('Index')['Date'].agg('min').sort_values()
sns.histplot(data=pd.to_datetime(start).dt.year, bins = 8)it seems like data is clean but each index has a different starting date from 1965 to 2012
# Calculate the annual return
an_return = pd.DataFrame(stock_pivot.pct_change().resample('Y').sum()).sum()
an_volatility = pd.DataFrame(stock_pivot.pct_change().resample('Y').std()).mean()ax = sns.barplot(data=an_return)
ax.set(xlabel='index', ylabel='annual return')
plt.xticks(rotation=30)
ax2 = plt.twinx()
sns.lineplot(data=an_volatility, ax=ax2, color = 'red')
plt.grid()
ax2.set_ylabel('annual volatility', color='red')The best annual return on investment with less volatility is index IXIC. For more secure investment with the lowest volatility, indices GSPTSE and NYA are the best although return on investment is less than IXIC.
# visualize best indices with a 30 days rolling windows since the negining of the covid break through
ixic = stock_pivot['IXIC'].dropna().resample('D').interpolate() \
.pct_change().rolling(30).mean().dropna().mul(100)
gsptse = stock_pivot['GSPTSE'].dropna().resample('D').interpolate() \
.pct_change().rolling(30).mean().dropna().mul(100)
nya = stock_pivot['NYA'].dropna().resample('D').interpolate() \
.pct_change().rolling(30).mean().dropna().mul(100)
ax = pd.concat([ixic, gsptse, nya], axis=1).loc['2019-11':,:].plot()
ax.set(xlabel='date', ylabel='annual return')
plt.grid()
plt.title('Daily return with 30 days rolling windows since COVID')