Skip to content

2k Factorial Experiments ("Screening Experiments")

factorial design is intended for screening experiments. As a preliminary step in an experimental investigation, we may brainstorm all possible factors that could reasonably have some effect on an output. This may result in a huge number of factors. A experiment is then used to narrow down this list of factors to only the most important factors and interactions. Later on, after narrowing down the list of factors, we'll perform another, more thorough test to investigate more precisely the effect of different factor levels and interactions.

Because this is only an initial test for a large number factors, we will consider only levels of each factor. If here are factors, then this results in different combinations of factor levels, which is the origin of the name " Experiment". Gathering replicate data for each of the factor combinations yields total observations.

Model Change

Since we have only 2 levels for each factor, we will modify our point of view. We will think of the two levels as a "control" (or "low") level and a "treatment" (or "high") level. Previously we defined treatment effects as deviations away from the global mean; but now we will define them as changes away from the "control" value. Numerically this means that our effects are twice as big as they were before (since the mean lies half-way between the "control" and "treatment" values).

We will use the notation "" and "" for the "low" and "high" levels of each factor. Alternately, if we want to plot things, we will use values "" and "". We can even use this to scale factors and work numerically (for example, to do regression analysis). Note that even if our factor levels were already numerical (for example, temperatures of and ), we would still rescale them to and for this analysis.

In the notation from before, our observations are written . For example

  • are the replicates of the control.
  • is the third replicate observation where only treatment is applied (and other treatments are at their "control" level).
  • is the second replicate observation where only treatments and are applied.

In our notation from before, averages would be written with subscripts and superscripts indicating the factor and level ("control"="-" or "treatment"="+") to fix during the averaging. For example

  • averages over the observations where the treatment was applied
  • averages over the observations where the treatment was at "control" level.
  • averages over observations where was applied but was not.

In the work below we will use sums rather than averages. We will use this same notation, without the "bar".

Yates Notation.

One of the big obstacles dealing with a large number of levels is the logisitical bookkeeping required to organize and perform tests for interactions on the resulting large dataset. A beautiful organizational method was developed by F.Yates in the 1930's to help with this. Yates' notation was designed to help with computations by hand... so it isn't quite as important now that analysis is done by computer, but it is still standard for discussing and organizing data in experiments.

Suppose that factors are indicated each by some (upper-case) letter: , , etc. In Yates notation we write "" for observations where all factors are at the "control" level; and label other combinations of levels of factors using a product of (lower-case) letters, indicating factors at their treatment level (i.e. not at the control level).

For example, "" corresponds to an observation where treatments and are applied, but other factors are at their control value.

There's a fairly standard picture of the "geometric" view for 3 factors that you can quickly find by a google search:

In general, we are looking at vertices of a dimensional cube -- i.e. a dimensional array with two values in each dimension. You can think of , , similar to the direction vectors , , used in 3d vector calculus and physics.

In analysis, we'll reuse this notation... we will use the product of lower letters notation to represent the sum of all replicates of a certain type. So will be the sum of all observations where treatments and were applied, but other treatments were at their control level. We write "" (or sometimes with parenthesis "") for the sum of all replicates with all treatments at the control level.

We also use a product of upper letters notation to represent main and interaction effects themselves ( and ).

Contrasts

We want to look at sum of squares and mean squares. But it is helpful to introduce new notation for a "half-way" step first. The main benefit of this is to get simpler formulas for Sum of Square interaction effects (which were previously somewhat tedious).

Main Contrast

The contrast for a main effect is the total difference in value due to applying a treatment.

  • This can be expressed nicely in terms of the lower-case notation from earlier. For example
  • For one factor :


  • For two factors and :


  • For three factors , , and :
Two Way Interaction Contrast

The contrast for a two-way interaction effect is the difference of differences.

  • It is not hard to show that . Using Yates notation this looks like the following.
  • For two factors and :


  • For three factors , , and :
Three Way Interaction (and Higher) Contrast

Contrasts for three-way and higher interactions are defined similarly. The three-way interaction contrast is the triple difference.

  • Note that the sign of each term is the product of the signs in the exponent .

For example

  • For three factors , , and :


  • For four factors , , , and :
Converting Contrast to Effect

The main and interaction effects can be computed from contrasts (since these are the average value of the corresponding differences):



  • etc
Converting Contrast to Sum of Square

Looking back at our earlier work, we can connect contrasts to the sum of square values (treatment and interaction effects) from before.



  • etc..

(Recall that total number of observations is .)

This is very nice, because high order interaction sum of squares was complicated to compute in our previous formulation.

Note: Each of these has only degree of freedom (since there were only 2 levels), thus their "mean square" value equals their "sum of square" value.

The Design Matrix

The benefit of using contrasts as an intermediate step is that there is a simple matrix multiplication rule which can be used to quickly compute all of the contrasts for your dataset! Especially for the interaction effects, this is a great time-saver.

Each contrast is a sum over all of the combinations of levels of factors. For a main contrast, like , an observation is added with coefficient if it is in (if it includes treatement ) and with coefficient if it is in (if it does not include treatment -- i.e. if is at the control level). In terms of Yates notation, a product of treatments is if it includes and otherwise.

For example

  • For , observations , , , are all
  • For , observations , , , are all

For two-way interaction contrasts the coefficient of a term is the product of the coefficients for and . For example

  • For , observations , , , are all
  • For , observations , , , are all

Three-way and above also have coefficients given by products of the main contrast coefficients.

If we organize our work carefully then the coefficients make a pretty table.

Levels

The coefficients follow a nice pattern -- the binary expansion of the numbers 0--7!

Note: in each column above, half of the numbers are and half are . The sum of the values in each column is 0.

Note: each of the columns above are orthogonal -- the dot product of any column with another is .

Once we have these coefficients, we can multiply them to get coefficients for the interaction effects.

Levels

The comments above about sum of column values and orthogonality of columns extend to the entire table of products. Although we haven't expressed it this way, from a linear algebra point of view, converting to contrasts is equivalent to making a change of basis.

If we wanted to, I'm pretty sure we could express this entire chapter in terms of discrete Fourier transforms. In fact, if we had a massive amount of data, then we could probably speed up the computation of all contrasts by using something like a fast Fourier transform. ... though you'd need a really ridiculous amount of data or factors in order for anyone to care about this, given current computer speeds.

## 2^3 Factorial design matrix

F   <- sapply(0:7, function(x) as.numeric(intToBits(x)))
			  
## contrast coefficient table is binary expansion			  
A   <- F[1,]*2 - 1
B   <- F[2,]*2 - 1
C   <- F[3,]*2 - 1			  

## coefficients for interactions are products			  
AB  <- A * B
AC  <- A * C 
BC  <- B * C 
ABC <- A * B * C 

## full contrast coefficient table			  
data.frame(A,B,C,AB,AC,BC,ABC,
		   row.names=c("1","a","b","ab","c","ac","bc","abc"))
## Rows are:
# cba
#------
# 000 -> (1)
# 001 ->  a
# 010 ->  b
# 011 -> ab
# 100 ->  c
# 101 -> ac
# 110 -> bc
# 111 -> abc			  
## We can also construct this table using arrays
fA <-       array(c(-1,1), dim=rep(2,3))
fB <- aperm(array(c(-1,1), dim=rep(2,3)), c(3,1,2))
fC <- aperm(array(c(-1,1), dim=rep(2,3)), c(2,3,1))

## full contrast coefficient table			  
data.frame(A  = as.vector(fA),
		   B  = as.vector(fB),
		   C  = as.vector(fC),
		   AB = as.vector(fA*fB),
		   AC = as.vector(fA*fC),
		   BC = as.vector(fB*fC),
		   ABC= as.vector(fA*fB*fC),
		   row.names=c("1","a","b","ab","c","ac","bc","abc"))

Once we have vectors of coefficients, we can use dot products with our data in order to make contrasts which can then be converted to mean square values and used for -tests.

Example:
## Generate some data... let's do something with 4 factors
control <- 5   # control value
tA  <- 2/2     # treatment A effect
tB  <- 2/2     # treatment B effect
tC  <- 0       # treatment C effect (nonexistent)
tD  <- 2/2     # treatment D effect
tAB <- 2/2     # AB interaction
tAD <- 0       # no AD interaction
tBD <- 2/2     # BD interaction
tABD<- 0       # no ABD interaction
#  maybe all of the other interactions are 0...

n <- 50  # lets use 50 replicates

## Combine control and effects to get expected values
Y <-   array(control, rep(2,4)) +        # 4 dimensions, each with 2 values
       array(c(-tA,tA), rep(2,4)) +               # A
 aperm(array(c(-tB,tB), rep(2,4)), c(4,1,2,3)) +  # B
 aperm(array(c(-tC,tC), rep(2,4)), c(3,4,1,2)) +  # C
 aperm(array(c(-tD,tD), rep(2,4)), c(2,3,4,1)) +  # D
       array(c(tAB,-tAB,-tAB,tAB), rep(2,4)) +    # AB
 aperm(array(c(tAD,-tAD,-tAD,tAD), rep(2,4)), c(1,3,4,2)) + # AD
 aperm(array(c(tBD,-tBD,-tBD,tBD), rep(2,4)), c(3,1,4,2))   # BD


## Now add 10 replicates and error
Y <- as.vector(rep(Y, n))  + rnorm(n*16, 0, 3)

## Make factor vectors
A <-       array(c(-1,1), dim=rep(2,4))
B <- aperm(array(c(-1,1), dim=rep(2,4)), c(4,1,2,3))
C <- aperm(array(c(-1,1), dim=rep(2,4)), c(3,4,1,2))
D <- aperm(array(c(-1,1), dim=rep(2,4)), c(2,3,4,1))

# include replicates
A <- as.vector(rep(A, n))
B <- as.vector(rep(B, n))
C <- as.vector(rep(C, n))
D <- as.vector(rep(D, n))

data <- data.frame(Y,A,B,C,D)
data

## Now we can compute contrasts!

contrasts <- 
  data.frame(A=sum(Y*A), B=sum(Y*B), C=sum(Y*C), D=sum(Y*D), 
	  	     AB=sum(Y*A*B), AC=sum(Y*A*C), AD=sum(Y*A*D), 
		     BC=sum(Y*B*C), BD=sum(Y*B*D), CD=sum(Y*C*D) )
## Ugh.. this is a lot of contrasts, let's not include any three-way or higher ones...


cat('contrasts:')
contrasts

## We can also compute effects and mean squares!

cat('computed effects:')
contrasts / (n*8)

cat('mean squares:')
contrasts^2 / (n*16)

Tests for Significance

Once we've computed effects and mean squares there are two different (equivalent) tests we could use to check for significance.

We can do the usual ANOVA -test or an equivalent Regression -test.

F-test for significance

For the -test we would compute using variance of and then subtract to get

The sum of squares table looks like

SourcedfSSMSFp-val
1-pf(F,1,2k(n-1))
1-pf(F,1,2k(n-1))
1-pf(F,1,2k(n-1))
Error
t-test for significance

Since our factor values are numerical and we could do high dimensional regression. Since there were only two levels in each factor, the effect between and is twice the slope of the line between the average values. So the regression model is

Where and , etc. , etc.

Instead of doing an -test for whether an effect is zero, we can to a -test on whether the corresponding slope is zero. These tests should give the same results since an test with degree of freedom in the numerator is equivalent to a test on the square root. It isn't too hard to show that the regression statistic for a slope is

(Recall that )

coeffs <- matrix(c(A,B,C,D,A*B,A*C,A*D,B*C,B*D,C*D,A*B*C,A*B*D,A*C*D,B*C*D,A*B*C*D),
				 ncol=16-1)

contrasts <- apply(t(coeffs) %*% Y, 1, sum)

SS <- contrasts^2 / (n*16)
					
SSE <- (n*16-1)*sd(Y)^2 - sum(SS)
MSE <- SSE / (16*(n-1))
					
aov <- data.frame(
 row.names=c('A','B','C','D','AB','AC','AD','BC','BD','CD','ABC','ABD','ACD','BCD','ABCD','Error'),
 df       =c(1  ,  1,  1, 1 , 1  ,  1 ,  1 ,  1 ,  1 ,  1 ,  1  ,  1  ,  1  ,  1  ,  1   , (n-1)*(2^4)),
 'Sum Sq' =c(SS, SSE),
 'Mean Sq'=c(SS, MSE),
 'F value'=c(SS/MSE,''),
 'p-value'=c(1-pf(SS/MSE,1,16*(n-1)),'')
)	

aov

... of course, we could also do this with a single aov(..) function in R.

summary( aov( Y ~ A + B + C + D + A*B + A*C + A*D + B*C + B*D + C*D + A*B*C + A*B*D + A*C*D + B*C*D + A*B*C*D,
			 data=data) )

This is equivalent to a regression -test.

summary( lm( Y ~ A + B + C + D + A*B + A*C + A*D + B*C + B*D + C*D + A*B*C + A*B*D + A*C*D + B*C*D + A*B*C*D,
			 data=data) )