Two Sample Tests.
Comparing Means (known σ2 or large number of samples)
Two independent normal random variables, X and Y:
~ Normal( ) ~ Normal( )
Take samples of the random variables:
random samples of : random samples of :
Look at sample means:
has mean and variance has mean and variance
Test null hypothesis
H0:
has mean and variance
Use test statistic
Normal(0,1)
Required sample size
To achieve
Recall: for one sample
If
....
So (this formula appears again later when we look at degrees of freedom in the two sample -test)
Example:
Suppose we expect
We want to test against H0:
with
If we sample
(pnorm(.05/2)+pnorm(.1))^2 * (5^2 + 8^2) / (2)^2
Maybe we manage to sample
X <- rnorm(20, 10, 5) ; Y <- rnorm(40, 12, 8)
In this case, the statistic for testing H0 is
Z <- (mean(X) - mean(Y)) / sqrt( 5^2 / 20 + 8^2 / 40 )
Z
pnorm(Z)*2
Note to self:
Add some code later which uses bootstrapping to show that the estimated n above gives the correct power for the test.
Comparing Means (unknown : unequal variances)
Just like for the single sample setup, we'll replace
Test null hypothesis
H0:
has mean and approximate variance
where and
Use test statistic
-
-
degrees of freedom
is "equivalent sample size of and "
So
Summary:
Variance is the sum
Degrees of freedom is the average
Example:
Consider a similar setup as above.
Want to test against H0:
Quantile plots suggest that
(Later we'll discuss a hypothesis test to analytically check if variances are different.)
qqX <- qqnorm(X, plot.it = FALSE) # generate quantile plots for X
qqY <- qqnorm(Y, plot.it = FALSE) # generate quantile plots for Y
plot(range(qqX$x, qqY$x), # generate plot box
range(qqX$y, qqY$y), # with correct x and y limits
type="n", xlab='',ylab='') # and nothing inside
points(qqX) # plot X quantile points
points(qqY, col = 'red', pch = 3) # plot Y quantile points
abline(mean(X),sd(X)) # best fit line for X
abline(mean(Y),sd(Y), col='red') # best fit line for Y
Since we don't know the underlying variance, we'll use sample variance instead.
Our test statistic is
T <- (mean(X) - mean(Y)) / sqrt( sd(X)^2 / 20 + sd(Y)^2 / 40 )
T