More Two Sample Tests
In this lecture we'll discuss the nonparametric Wilcoxon rank sum test as well as the paired sample t-test.
Wilcoxon Signed-Rank Test (one sample test)
First, let's briefly recall the one sample "Wilcoxon Signed-Rank Test". This is a nonparametric test used on data
- H0:
For the signed rank test, we sort the sample values
- Statistic:
For small
X <- sample.int(100,50,replace=TRUE) # generate 50 random integers between 1 and 100
wilcox.test(X,mu=60) # Wilcox Signed Rank Test against mu=60
DataCamp Workspaces has recently integrated ai code generation into workspaces. In order to show off this ability (and be lazy) I'll ask DataCamp to generate demonstration code for Wilcoxon Signed rank test by hand below:
# Here's an example of how to perform a Wilcoxon Signed Rank Test by hand in R:
# First, let's generate some sample data
X <- sample.int(100,50,replace=TRUE)
# Test against mu = 60
mu <- 60
# Next, we'll calculate the difference from mu
D <- X - mu
# Then, we'll rank the absolute values of the differences
r <- rank(abs(D))
# We'll calculate the sum of the ranks for positive differences
Wp <- sum(r[D > 0])
# And the sum of the ranks for negative differences
Wn <- sum(r[D < 0])
# Finally, we'll calculate the test statistic
W <- min(Wp, Wn)
# We can calculate the p-value using the Wilcoxon Signed Rank Test distribution
n <- length(D)
p <- 2 * pnorm( W, mean = (n * (n + 1)) / 4, sd = sqrt(n * (n + 1) * (2 * n + 1) / 24))
# Let's print out the results
cat("Wilcoxon Signed Rank Test\n")
cat("------------------------\n")
cat("Sample size: ", n, "\n")
cat("Test statistic: ", W, "\n")
cat("p-value: ", p, "\n")
That is pretty awesome!!!!
To be fair, the code above is not exactly the code that DataCamp ai generated. I tweaked it a bit to more closely match my first example. Also there was a mysterious error in the way it computed the p-value. The original computation generated by the ai was
p <- 2 * pnorm(-abs(W), mean = (n * (n + 1)) / 4, sd = sqrt(n * (n + 1) * (2 * n + 1) / 24))
The command
pnorm(-abs(W))
would work to compute p-values after normalization, but doesn't work for normal distributions that don't have mean 0. Anyway, since we already took W <- min(Wp,Wn) we know it will be to the left of the mean, so we can just do pnorm directly.
p <- 2 * pnorm( W , mean = (n * (n + 1)) / 4, sd = sqrt(n * (n + 1) * (2 * n + 1) / 24))
Wilcoxon Rank-Sum Test (two sample test)
The two sample version of the Wilcoxon test is essentially computing
- H0:
For this test, we will combine the sample values
If there are
If
Once again, for small values of
Note. This reduces to the Signed-Rank Test if we let
X <- sample.int(100,20,replace=TRUE) # generate 20 random integers between 1 and 100
Y <- sample.int(80, 30,replace=TRUE) # generate 30 random integers between 1 and 80
wilcox.test(X,Y) # Wilcox Rank-Sum test of X vs Y
Compare this to a computation by hand (using the same data)
n <- length(X) # get length of X
m <- length(Y) # get length of Y
r <- rank(c(X,Y)) # make vector of ranks of X,Y values
Wx <- sum(r[1:n]) # sum up the ranks of X values
Wy <- (n+m)*(n+m+1) / 2 - Wx # get ranks of Y values
W <- min(Wx,Wy) # the minimum gives best numerical accuracy?
mu <- min(n,m) * (n+m+1) / 2
p <- 2*pnorm( -abs(W-mu), mean = 0, sd = sqrt(n*m*(n+m+1)/12) )
cat("Wilcoxon Rank-Sum Test\n")
cat("------------------------\n")
cat("Sample sizes: ", n, "&", m, "\n")
cat("Test statistic: ", W, "\n")
cat("p-value: ", p, "\n")
The code above gives a slightly different
Since
Paired Sample -Test
Freqently we wish to test for difference of means in data which is not independent. For example, testing before/after effects where two measurements are made on the same person, one before a treatment and one after. Or else testing opinion differences where each person is asked their opinion of two different items. In each of these cases, fact that measurements were made from the same source introduces possible correlation, violating the assumptions of the t-test.
Setup:
- Input is an independent set of pairs of samples
- Wish to test against H0:
or more generally
Idea:
- For the
-test before, we used statistic ) (difference of means) - Now we will use
(mean of differences)
Plan:
- Convert pairs to differences
. - Do a single sample
-test on .
X <- rnorm(10, 20, 8) # generate some X samples
Y <- X + rnorm(10, 3, 4) # Y ≈ X+3 (not independent)
curve(dnorm(x,20,6),from = 0, to = 40, ylab='') # plot the distribution of X
curve(dnorm(x,23,sqrt(64+16)), add=TRUE, col='red') # plot the distribution of Y
legend('topleft', col = c('black','red'),
lty = c( 1 , 1 ),
legend = c('X', 'Y' ))
t.test(X, Y, paired=TRUE)
Note that paired two sample t-tests are really just one sample
t.test(X-Y)
Paired Sample -test vs Two sample -test
The paired sample
- H0:
The "mean difference" statistic used in the paired t-test is
identical to the "difference of means" statistic used in the two sample
Since
Recall that the first two terms above are the variance used in the two sample t-test if X and Y are each sampled n times
This is the benefit of the paired
Note that, in cases where