Descriptive Analysis of Semiconductor Stocks’ Log-Returns
1. Data Collection and Preparation
In this analysis, we examine the historical price performance of four leading semiconductor stocks—NVIDIA (NVDA), AMD, Broadcom (AVGO), and Taiwan Semiconductor Manufacturing Company (TSM)—over the period from January 1, 2010, to January 29, 2025. The adjusted closing prices were retrieved from Yahoo Finance, forming the basis for further statistical computations.
After collecting the data, we calculated log-returns, which offer a more statistically sound approach to return analysis compared to simple percentage changes. This transformation ensures additivity over time and normalizes return distributions, making them more suitable for risk and performance assessment.
2. Summary Statistics
To assess the fundamental characteristics of stock returns, we computed various descriptive statistics:
- Mean Return (%): The average daily return, scaled to percentage terms.
- Standard Deviation (%): A measure of return volatility, indicating risk.
- Skewness: The asymmetry of the return distribution; positive values indicate a right-skewed distribution (frequent small losses, occasional large gains), while negative values indicate the opposite.
- Kurtosis: Measures the “tailedness” of the distribution; a higher value suggests frequent extreme returns.
- Excess Kurtosis: The degree to which kurtosis deviates from a normal distribution (where excess kurtosis = 0).
- Minimum and Maximum Returns (%): The most extreme daily log-returns.
- Quantiles (5%, 25%, 50%, 75%, 95%): Provides insights into return distributions at different percentiles.
3. Statistical Tests
Beyond basic descriptive statistics, we applied several econometric tests to assess the statistical properties of stock returns:
- Jarque-Bera Test: Evaluates whether the return distribution deviates from normality by considering skewness and kurtosis. A significant p-value (< 0.05) suggests non-normality.
- Ljung-Box (LB) Test (Lag 1): Assesses autocorrelation in returns. If significant, it suggests dependence in the time series, which violates the efficient market hypothesis.
- Box-Pierce (BP) Test (Lag 1): Similar to the Ljung-Box test but focuses on larger-sample properties.
- Number of Observations: The count of valid return observations in the dataset.
4. Key Findings
- Volatility and Risk: The standard deviations provide a comparative measure of risk among the four semiconductor stocks. A higher standard deviation indicates greater price fluctuations.
- Non-Normality: The Jarque-Bera test results likely suggest that the return distributions exhibit significant deviations from normality, a common feature in financial time series.
- Autocorrelation Analysis: If the Ljung-Box and Box-Pierce test statistics are significant, it indicates that past returns have predictive power over future returns.
This statistical overview serves as a foundation for further risk modeling, portfolio optimization, and investment decision-making in the semiconductor sector.
import pandas as pd
import numpy as np
import yfinance as yf
start_date = "2010-01-01"
end_date = "2025-01-29"
tickers = ["NVDA", "AMD", "AVGO","TSM"]
adjCP = []
for ticker in tickers:
Stock_data = yf.download(ticker, start=start_date, end=end_date)
adjCP.append(Stock_data["Close"])
# Use concatenate to build the dataframe
adjCP = pd.concat(adjCP, axis=1)
# rename columns
adjCP.columns = tickers
adjCP.tail()
# Calculate log-returns for each stock
lret = np.log(adjCP / adjCP.shift(1)).dropna() # drop the first line
lret.head()
# descriptive stats of the sample
from scipy.stats import skew, kurtosis, jarque_bera
import statsmodels.api as sm
def multi_fun(x):
stat_tab = {
'Mean': round(np.mean(x) * 100,3),
'St.Deviation': round(np.std(x) * 100,3),
'Skewness': round(skew(x),3),
'Kurtosis': round(kurtosis(x),3),
'Excess.Kurtosis': round(kurtosis(x) - 3,3),
'Min': round(np.min(x) * 100,3),
'Quant5': round(np.quantile(x, 0.05) * 100,3),
'Quant25': round(np.quantile(x, 0.25) * 100,3),
'Median': round(np.quantile(x, 0.50) * 100,3),
'Quant75': round(np.quantile(x, 0.75) * 100,3),
'Quant95': round(np.quantile(x, 0.95) * 100,3),
'Max': round(np.max(x) * 100,3),
'JB.stat': round(jarque_bera(x)[0],3),
'JB.pval': round(jarque_bera(x)[1],3),
'LB.lag1.stat' : round(sm.stats.diagnostic.acorr_ljungbox(x, lags=1, boxpierce=False)['lb_stat'][1],3),
'LB.lag1.pval' : round(sm.stats.diagnostic.acorr_ljungbox(x, lags=1, boxpierce=False)['lb_pvalue'][1],3),
'BP.lag1.stat' : round(sm.stats.diagnostic.acorr_ljungbox(x, lags=1, boxpierce=True)['bp_stat'][1],3),
'BP.lag1.pval' : round(sm.stats.diagnostic.acorr_ljungbox(x, lags=1, boxpierce=True)['bp_pvalue'][1],3),
'N.obs': round(len(x))
}
return stat_tab
# apply the function to each asset
output = lret.apply(multi_fun, axis=0)
descr_stat = pd.DataFrame(output.tolist(), index=output.index).T
print(descr_stat)
# export it to csv
descr_stat.to_csv('descr_stat.csv')
print (descr_stat)
Return since 2010
import matplotlib.pyplot as plt
# compute the compounded returns (initial investment = 100 USD)
def R100(x):
init_inv = 100
R = (1 + x).cumprod() * init_inv
return R
R = lret.apply(R100)
# Plot R
plt.figure(figsize=(10, 6))
plt.plot(R.index, R['NVDA'], linestyle='-', label = "NVDA")
plt.plot(R.index, R['AMD'], linestyle='-', label = "AMD")
plt.plot(R.index, R['AVGO'], linestyle='-', label = "AVGO")
plt.plot(R.index, R['TSM'], linestyle='-', label = "TSM")
plt.title('Cumulative Rets (100 USD)')
plt.ylabel('USD')
plt.legend()
plt.grid(True)
plt.show()
Four-year return since 2020
# repeat the analysis for the last four years!
lret20 = lret.loc['2020-01-01':]
R2020 = lret20.apply(R100)
# Plot R2022
plt.figure(figsize=(10, 6))
plt.plot(R2020.index, R2020['NVDA'], linestyle='-', label = "NVDA")
plt.plot(R2020.index, R2020['AMD'], linestyle='-', label = "AMD")
plt.plot(R2020.index, R2020['AVGO'], linestyle='-', label = "AVGO")
plt.plot(R2020.index, R2020['TSM'], linestyle='-', label = "TSM")
plt.axhline(y=100, color='black', linestyle='-', linewidth=3, zorder=1)
plt.title('Cumulative Rets (100 USD)')
plt.ylabel('USD')
plt.legend()
plt.grid(True)
plt.show()
One year return since 2023
# repeat the analysis for the last year!
lret23 = lret.loc['2024-01-01':]
R2023 = lret23.apply(R100)
# Plot R2023
plt.figure(figsize=(10, 6))
plt.plot(R2023.index, R2023['NVDA'], linestyle='-', label = "NVDA")
plt.plot(R2023.index, R2023['AMD'], linestyle='-', label = "AMD")
plt.plot(R2023.index, R2023['AVGO'], linestyle='-', label = "AVGO")
plt.plot(R2023.index, R2023['TSM'], linestyle='-', label = "TSM")
plt.axhline(y=100, color='black', linestyle='-', linewidth=3, zorder=1)
plt.title('Cumulative Rets (100 USD)')
plt.ylabel('USD')
plt.legend()
plt.grid(True)
plt.show()
Compute the time-varying performance
In this section, we compute and analyze the time-varying performance metrics for the selected assets. The metrics include the Sharpe Ratio (ShR), Value at Risk (VaR), modified Value at Risk (mVaR), and modified Sharpe Ratio (mShR). These metrics are calculated using a rolling window approach to capture the dynamic nature of the financial markets.
Sharpe Ratio (ShR)
The Sharpe Ratio is a measure of risk-adjusted return. It is calculated as the mean excess return (over the risk-free rate) divided by the standard deviation of the excess return.
Value at Risk (VaR)
Value at Risk is a statistical measure that quantifies the level of financial risk within a portfolio over a specific time frame. It is defined as the maximum loss not exceeded with a given confidence level.
Modified Value at Risk (mVaR)
The modified Value at Risk is an extension of the traditional VaR, which takes into account the average of the worst losses. It is calculated by sorting the returns and taking the mean of the returns below the ( \alpha ) percentile.
Modified Sharpe Ratio (mShR)
The modified Sharpe Ratio is a risk-adjusted return measure that uses the modified Value at Risk instead of the standard deviation.
Rolling Window Analysis
A rolling window of 252 trading days (approximately one year) is used to compute these metrics over time. This approach allows us to observe how the performance metrics evolve and respond to market conditions. The rolling window analysis is performed for each asset, and the results are stored in dataframes with appropriate time indices.
The computed metrics are then visualized to provide insights into the risk and return characteristics of the assets over the selected time period.
# define the sharpe ratio function
def ShR(ret,rf=0):
mu = np.mean(ret-rf)*252
std = ret.std()*np.sqrt(252)
return mu/std*100
# define the VaR function
def VaR(returns, alpha):
return np.percentile(returns, alpha)*252
# define the mVaR function
def mVaR(returns, alpha):
sorted_returns = np.sort(returns)
index = int(alpha * len(sorted_returns))
return np.mean(sorted_returns[:index]) * 252
# define the mShR function
def mShR(ret,rf,alpha):
e = ret-rf
mu = np.mean(e)*252
m_VaR = mVaR(e,alpha)
return mu/m_VaR
# set a rolling window (see lect 6, I just copied pasted and modified)
w = 252
T = lret.shape[0] # number of obs (= days)
nSubsets = T - w # number of subsets (out of sample days)
assets = 4 # number of assets
alpha = .05 # define the confidence level
# define an empty numpy matrix to collect the output (one for each metric)
var = np.empty((len(range(nSubsets)),assets))
shr = np.empty((len(range(nSubsets)),assets))
mvar = np.empty((len(range(nSubsets)),assets))
mshr = np.empty((len(range(nSubsets)),assets))
for j in range(nSubsets):
# select the window observations
subset_j = lret[j:w + j - 1]
# compute metrics
var[j,] = subset_j.apply(VaR, alpha=alpha)
shr[j,] = subset_j.apply(ShR,rf=0)
mvar[j,] = subset_j.apply(mVaR, alpha=alpha)
mshr[j,] = subset_j.apply(mShR, rf=0, alpha=alpha)
# add time index to the metrics and tranform them in dataframes
VaR_t = pd.DataFrame(var, index=lret.index[w:], columns=tickers)
ShR_t = pd.DataFrame(shr, index=lret.index[w:], columns=tickers)
mVaR_t = pd.DataFrame(mvar, index=lret.index[w:], columns=tickers)
mShR_t = pd.DataFrame(mshr, index=lret.index[w:], columns=tickers)