Skip to content

Two Sample Tests.

Comparing Means (known σ2 or large number of samples)

Two independent normal random variables, X and Y:

  • ~ Normal()
  • ~ Normal()

Take samples of the random variables:

  • random samples of :
  • random samples of :

Look at sample means:

  • has mean and variance
  • has mean and variance

Test null hypothesis H0:

  • has mean and variance

Use test statistic

  • Normal(0,1)
Required sample size

To achieve for a given significance , by sampling and both times, requires

Recall: for one sample -test the required was

If and are sampled a different number of times, then you need

.... samples of and samples of is equivalent to samples of each according to the following rule:

  • So (this formula appears again later when we look at degrees of freedom in the two sample -test)

Example:

Suppose we expect ~ Normal() and ~ Normal().

We want to test against H0: (to prove that means are different)
with and (i.e. chance of failing to get significant results from experiment)

If we sample and equally many times, the required number of samples is

(pnorm(.05/2)+pnorm(.1))^2 * (5^2 + 8^2) / (2)^2 

Maybe we manage to sample only times but get samples of .

X <- rnorm(20, 10, 5) ; Y <- rnorm(40, 12, 8)

In this case, the statistic for testing H0 is

Z <- (mean(X) - mean(Y)) / sqrt( 5^2 / 20  +  8^2 / 40 )
Z

should be distributed as standard normal. We can compute it's -value to see whether our difference has reached significance.

pnorm(Z)*2
Note to self:

Add some code later which uses bootstrapping to show that the estimated n above gives the correct power for the test.

Comparing Means (unknown : unequal variances)

Just like for the single sample setup, we'll replace by the sample variance .

Test null hypothesis H0:

  • has mean and approximate variance
    where and

Use test statistic

  • degrees of freedom is "equivalent sample size of and "

    So

Summary:

Variance is the sum
Degrees of freedom is the average

Example:

Consider a similar setup as above.

Want to test against H0: based off of samples from and samples from .

Quantile plots suggest that and are both normal, but with different variances (slopes are unequal).
(Later we'll discuss a hypothesis test to analytically check if variances are different.)

qqX <- qqnorm(X, plot.it = FALSE)       # generate quantile plots for X
qqY <- qqnorm(Y, plot.it = FALSE)       # generate quantile plots for Y

plot(range(qqX$x, qqY$x),               # generate plot box 
	 range(qqX$y, qqY$y),               #  with correct x and y limits
	 type="n", xlab='',ylab='')         #  and nothing inside

points(qqX)                             # plot X quantile points
points(qqY, col = 'red', pch = 3)       # plot Y quantile points

abline(mean(X),sd(X))                   # best fit line for X
abline(mean(Y),sd(Y), col='red')        # best fit line for Y

Since we don't know the underlying variance, we'll use sample variance instead.

Our test statistic is

T <- (mean(X) - mean(Y)) / sqrt( sd(X)^2 / 20  +  sd(Y)^2 / 40 )
T