Possible Duplicate:
short formula call for many variables when building a model
I have a biggish data frame (112 variables) that I’d like to do a stepwise logistic regression on using R. I know how to setup the glm model and the stepAIC model, but I’d rather not type in all the headings to input the independent variables. Is there a fast way to give the glm model an entire data frame as independent variables such that it will recognize each column as an x variable to be included in the model? I tried:
ft<-glm(MFDUdep~MFDUind, family=binomial)
But it didn’t work (wrong data types). MFDUdep and MFDUind are both data frames, with MFDUind containing 111 ‘x’ variables and MFDUdep containing a single ‘y’.
You want the
.special symbol in the formula notation. Also, it is probably better to have the response and predictors in the single data frame.Try:
Now that I have given you the rope, I am obliged to at least warn you about the potential for hanging…
The approach you are taking is usually not the recommended one, unless perhaps prediction is the purpose of the model. Regression coefficient for selected variables may be strongly biased so if you are using this for enlightenment, then rethink your approach.
You will also need a lot of observations to allow 100+ terms in a model.
Better alternative exist; e.g. see the glmnet package for one such approach which allows for ridge, lasso or both (elastic net) constraints on the set of coefficients, which allows one to minimise model error at the expense of a small amount of additional bias.