Markowitz & Tech Stocks
This notebook simulates markowitz portfolio for every semester using the stocks prices from the last semester. It includes 3 parts:
- Data Preparation
- Portfolio Simulation
- Data Visualization
Part of the code is from: https://towardsdatascience.com/python-markowitz-optimization-b5e1623060f5
Feedbacks are appreciated.
Data Preparation
Importing libraries and Datasets
# libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import math
# Datasets
df1 = pd.read_csv("data/AAPL.csv")
df2 = pd.read_csv("data/AMZN.csv")
df3 = pd.read_csv("data/BABA.csv")
df4 = pd.read_csv("data/CRM.csv")
df5 = pd.read_csv("data/FB.csv")
df6 = pd.read_csv("data/GOOG.csv")
df7 = pd.read_csv("data/INTC.csv")
df8 = pd.read_csv("data/MSFT.csv")
df9 = pd.read_csv("data/NVDA.csv")
df10 = pd.read_csv("data/TSLA.csv")
Checking datasets
df1.head()
For our simulation we'll use only Close prices. We can see that's two columns for this price. The "nominal" one and other that is adjusted for events such as split and/or reverse split. Another observation is the period of time for each stock.
To keep it simple, I'll select stocks that have no splits and have the same period of time.
To check if there was splits, I'll compare close and adjusted close prices from the earliest date possible. If its different, than there was a split.
let's build a table with the following columns:
- Earliest date;
- Total return for each stock;
- Binary column telling if there was any split (1 = yes, 0 = no).
frames = [df1, df2, df3, df4, df5, df6, df7, df8, df9, df10]
stocks =["AAPL", "AMZN", "BABA", "CRM", "FB", "GOOG", "INTC", "MSFT", "NVDA","TSLA"]
dates = []
returns = []
splits = []
for i in range(len(frames)):
parcial = frames[i]
date = parcial.iloc[0][0]
dates.append(date)
return_ = round(100*((parcial.iloc[-1][4]/parcial.iloc[0][4])-1),2)
return_ = str(return_) + str('%')
returns.append(return_)
parcial.sort_values(by=['Date'], inplace=True, ascending=False)
close = parcial.iloc[-1][4]
adj_close = parcial.iloc[-1][5]
if close != adj_close:
split = 1
else:
split = 0
splits.append(split)
dfs = []
for df in frames:
df = df.round(2)
dfs.append(df)
summary = pd.DataFrame(list(zip(stocks, dates, returns, splits)),
columns =['Stock', 'Initial Date','Returns', 'Splits'])
summary.sort_values(by=['Initial Date'], inplace=True, ascending=False)
summary.head(10)
Dataframe adjustments
We can see that only 3 stocks satisfy the two conditions:
- Amazon (AMZN)
- Google (GOOG)
- Salesforce (CRM)
These 3 stocks have the same period of time, as we can see the earliest date is the same for each one. But let's check if there isn't any missing date (e.g: let's say we have prices for AMZN in 2015-01-04, but not for GOOG).
# Adjusting dataframes
df2 = df2[['Date','Close']]
df4 = df4[['Date','Close']]
df6 = df6[['Date','Close']]
# Merge Dataframes
final = df2.merge(df4, how='outer', on='Date')
final = final.merge(df6, how='outer', on='Date')
# Check for null values
final.isna().sum()
There's no missing date. I create new dataframes for the 3 stocks with only date and close price. Let me add a column for the semester.
sel_stocks1 = [df2,df4,df6]
for i in range(len(sel_stocks1)):
parcial = sel_stocks1[i]
# Column for date
parcial['Date'] = pd.to_datetime(df2['Date'])
parcial.sort_values(by=['Date'], inplace=True, ascending=True)
# Column for Semester
parcial['Semester'] = parcial['Date'].dt.year.astype(str) + '-S'+ np.where(parcial['Date'].dt.quarter.gt(2),2,1).astype(str)
# Round close prices
parcial['Close'] = parcial['Close'].round(2)
# Let's drop the semester (2021-S2) as we only have few days of it
sel_stocks = []
for i in range(len(sel_stocks1)):
parcial = sel_stocks1[i]
parcial = parcial[parcial['Semester']!='2021-S2']
sel_stocks.append(parcial)
check = sel_stocks[1]
check.tail()
Data Visualization
Let's check how the stocks performed over the time. To do so I'll normalize prices such as all 3 stocks will start at price 100.