The problem: I cannot remove a lower order parameter (e.g., a main effects parameter) in a model as long as the higher order parameters (i.e., interactions) remain in the model. Even when doing so, the model is refactored and the new model is not nested in the higher model.
See the following example (as I am coming from ANOVAs I use contr.sum):
d <- data.frame(A = rep(c("a1", "a2"), each = 50), B = c("b1", "b2"), value = rnorm(100))
options(contrasts=c('contr.sum','contr.poly'))
m1 <- lm(value ~ A * B, data = d)
m1
## Call:
## lm(formula = value ~ A * B, data = d)
##
## Coefficients:
## (Intercept) A1 B1 A1:B1
## -0.005645 -0.160379 -0.163848 0.035523
m2 <- update(m1, .~. - A)
m2
## Call:
## lm(formula = value ~ B + A:B, data = d)
## Coefficients:
## (Intercept) B1 Bb1:A1 Bb2:A1
## -0.005645 -0.163848 -0.124855 -0.195902
As can be seen, although I remove one parameter (A), the new model (m2) is refactored and is not nested in the bigger model (m1). If I transform my factors per hand in numerical contrast variables I can get the desired results, but how do I get it using R’s factor capabilities?
The Question: How can I remove a lower order factor in R and obtain a model that really misses this parameter and is not refactored (i.e., the number of parameters in the smaller model must be lower)?
But why? I want to obtain ‘Type 3’ like p-values for a lmer model using the KRmodcomp function from the pbkrtest package. So this example is really just an example.
Why not CrossValidated? I have the feeling that this is really more of an R then a stats question (i.e., I know that you should never fit a model with interactions but without one of the main effects, but I still want to do it).
Here’s a sort of answer; there is no way that I know of to formulate this model directly by the formula …
Construct data as above:
Confirm original finding that just subtracting the factor from the formula doesn’t work:
Formulate the new model matrix:
lm.fitallows direct specification of the model matrix:This method only works for a few special cases that allow the model matrix to be specified explicitly (e.g.
lm.fit,glm.fit).More generally:
This approach has the disadvantage that it won’t recognize multiple input variables as stemming from the same predictor (i.e., multiple factor levels from a more-than-2-level factor).