Skip to content

Even more two sample tests!

Comparing variance of two samples (F-test)

  • When comparing means, we looked at the difference .
  • For comparing variances, instead we'll use quotients .

Recall. The distribution is the sum of squares of independent standard Normal distributions. We used this previously to make confidence intervals for as well as one sample hypothesis tests for variance, goodness-of-fit, and independence.

  •     (testing variance)
  •     (goodness-of-fit & independence)

Now. We will use the distribution. is the distribution of the quotient of two independent ("normalized") chi-squared distributions,
where and have and degrees of freedom. Recall , so normalizes this to have , etc.

has two parameters,
  numerator degreees of freedom
  denominator degrees of freedom
The F distribution has mean

Note. Reversing the degrees of freedom flips the F distribution.
   
Similar to , the distributions are always positive and skew to the right.

Hidden code
Assumption:
  • and are independent, normal with means , and variances ,
Setup:
  • is sampled times:   ⟶ sample variance
  • is sampled times:   ⟶ sample variance
Null Hypothesis:
  • H0:
Statistic:

If the null hypothesis is true, then , so
   
         =  
is distributed as with and degrees of freedom (in the numerator and denominator).

(Note that flipping to swaps numerator and denominator degrees of freedom.)

P-value:

The -value is either if , or if .
(Note that we can arrange for -value to be a left-tail probability by swapping for if necessary. This is equivalent to letting be the sample with smaller sample variance.)

Cutoffs:

The cutoff values (boundary for CI) are:
 
 

Note that for example qf(.1, 6, 20) = 1 / qf(.9, 20, 6)

Example:

Suppose that we get samples of and samples of . We want to show that and have different variance.

X <- rnorm(20, 20, 8)
Y <- rnorm(40, 24, 5)

Our statistic will be

F <- sd(X)^2 / sd(Y)^2  
F

I prefer to take left-tail probabilities, so I'm going to flip everything.

F <- 1/F
F

The p-value will be .

Since I flipped , I should be careful to swap numerator and denominator degrees of freedom when I compute my -value.

p <- 2*pf(F,39,19)
p

The command to do this in R is

var.test(...)

var.test(X,Y)


Comparing two population proportions.

Let's return to our earlier example of population proportions. Previously we did a one-sample test comparing a sample population proportion with assumed population proportion . Now we will suppose that we have two populations which we have sampled to get
   
   
and we want to test against equality.

Setup:
  • Perform Bernoulli trials on one population to get
  • Perform Bernoulli trials on another population to get

(Note that and are the sample means of the corresponding Bernoulli trials.)

Assumptions:
  • is
  • is
Null Hypothesis:
  • H0:
Statistic:

Our textbook suggests that should be approximately normal with
    mean                                       (if is true)
    variance   (where )
Our textbook suggests pooling to estimate , and using this to estimate standard error which can then be used to standardize the statistic.

  •       is approximately standard normal (under the null hypothesis)
Example:

Suppose that we make measurements of finding occurrences and make measurements of finding occurences.

X <- 21; n <- 50
Y <- 53; m <- 80