Skip to content

Data Dictionary

ColumnExplanation
IndexTicker symbol for indexes
DateData of observation
OpenOpening price
HighHighest price during trading day
LowLowest price during trading day
CloseClose price
Adj CloseClose price adjusted for stock splits and dividends
VolumeNumber of shares traded during trading day
CloseUSDClose price in terms of USD

Source of dataset.

Don't know where to start?

Challenges are brief tasks designed to help you practice specific skills:

  • 🗺️ Explore: Which index has produced the highest average annual return?
  • 📊 Visualize: Create a plot visualizing a 30 day moving average for an index of your choosing.
  • 🔎 Analyze: Compare the volatilities of the indexes included in the dataset.

Scenarios are broader questions to help you develop an end-to-end project for your portfolio:

You are working for an investment firm that is looking to invest in index funds. They have provided you with a dataset containing the returns of 13 different indexes. Your manager has asked you to make short-term forecasts for several of the most promising indexes to help them decide which would be a good fund to include. Your analysis should also include a discussion of the associated risks and volatility of each fund you focus on.

You will need to prepare a report that is accessible to a broad audience. It should outline your motivation, steps, findings, and conclusions.


✍️ If you have an idea for an interesting Scenario or Challenge, or have feedback on our existing ones, let us know! You can submit feedback by pressing the question mark in the top right corner of the screen and selecting "Give Feedback". Include the phrase "Content Feedback" to help us flag it in our system.

Stock Exchange Data

This dataset consists of stock exchange data since 1965 for several indexes. It contains the daily stock prices along with the volume traded each day. We shall start our analysis by importing the required libraries.

#import the necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import stats

Here, we import the data and display the first few rows using the '.head()' method. The data contains 104224 rows and 9 column as shown with the '.shape' function.

#Import data
df = pd.read_csv("stock_data.csv", index_col=None)
display(df.head())
df.shape

There seems to be no missing values as well as duplicate value as observed below.

#Missing value check
df.isna().sum()
#Duplicate check
df.duplicated().sum()
#Confirm data types
df.info()

Our independent variable here is the index

plt.figure(figsize=(8, 7))
g = sns.barplot(x="Index", y="Open", data=df, ci=None);
g.set_xticklabels(rotation=90)