STA 354 ASSIGNMENT Generate three (3) variables using your own sample size hence, a. Obtain the regression analysis b. Obtain the standard error c. Obtain the t-tests d. obtain R_square e. Obtain the confidence test f. Obtain the p-value
Summary!!
-
The regression coefficient for the predictor x1 is 2.44046. This can be interpreted as keeping X2 and x3 constant, the predicted y would increase 2.44046 with a unit increase in x1.This again might be problem
-
Corresponding t statistic t1 is 0.0793 and t2 is 0.025 while that of t3 is 0.154
-
Since the statistic follows a t distribution with the degrees of freedom 26
-
From the output, we can also see that the multiple R-squared (coefficient of determination) is 0.06. Therefore, about 6% of the variation in y can be explained by the multiple linear regression with x1, x2, and x3 as the predictors.
-
The 95% confidence interval for the regression coefficients. Since this confidence interval does not contain the value 0, we can conclude that there is statistically significant association between x1, x2, x3 and y.
-
Since the p-value is less than 0.05, we conclude the coefficient is statistically significant.
#Name: Owoyemi Qudus Adebayo
#Matric number : 19/56EG109
#STA 354 ASSIGNMENT
#Generate three (3) variables using your own sample size hence, a. Obtain the regression analysis b. Obtain the standard error c. Obtain the t-tests d. obtain R_square e. Obtain the confidence test f. Obtain the p-value
#generate three random variables
rm(list=ls())
n= 38
k= 4
x1 = runif(n,3,7)
x2 = runif(n, 2.3, 7)
x3 = abs(rnorm(n, 3, 0.72))
# convert the variables to matrix
beta = c(1.3,2.6,2.2,3.1)
x= cbind(1,x1,x2,x3)
x
e = rnorm(n)
y= x%*%beta + e
y
#(a) obtain the intercepts of the multiple linear regression coefficient for the predictors
humblereg = lm(y~ x1 + x2 + x3)
#beta.hat
xtx.inv = solve(t(x)%*%x)
beta.hat = xtx.inv%*%t(x)%*%y
beta.hat
#SSR
y.bar = mean(y)
SSR = t(beta.hat)%*%t(x)%*%y - n*(y.bar**2)
SSR
#SST
SST = t(y)%*%y - n*(y.bar**2)
SST
#SSE
SSE = SST - SSR
#MSE
MSE = SSE/(n-k)
MSE
#Covariance matrix
MSE = as.numeric(MSE)
cov_mat = MSE*xtx.inv
cov_mat
#(b) Standard error of estimates
SE_B1 = sqrt(cov_mat[2,2])
SE_B2 = sqrt(cov_mat[3,3])
SE_B3 = sqrt(cov_mat[4,4])
SE_B1
SE_B2
SE_B3
#(c) Corresponding t statistic
B_1 = beta.hat[2,]
B_2 = beta.hat[3,]
B_3 = beta.hat[4,]
t1=B_1/SE_B1
t1
t2 = B_2/SE_B2
t2
t3 = B_3/SE_B3
t3
#(d) Confidence interval
confint(humblereg,c('x1','x2','x3'), level = 0.95)
#Since the statistic follows a t distribution with the degrees of freedom 26
df= n-k
#(e) R_square (coefficient of multipledetermination)
R_square= SSR/SST
R_square
#Radj
Radj = 1-((1-R_square)*((n-1)/(n-k)))
Radj
# (f) we can obtain the p-values as p_value_x1 = 0.263 , p_value_x2 = 0.821 , and p_value_x3 = 0.888
p_value_x1 = 2 *( 1- pt(t1,df))
print(p_value_x1)
p_value_x2 = 2 * (1-pt(t2,df))
print(p_value_x2)
p_value_x3 = 2* (1-pt(t3,df))
print(p_value_x3)
#Since the p-value is greater than 0.05, we conclude the coefficient is not statistically significant.
#summary of analysis
summary(humblereg)
#Question 2