I am trying to run a stepwise regression in R with 600 odd variables contained as the column name from the header of a .csv file.
How do i use the column name as variables in a regression equation?
I am very new to this and my limited understanding it that i can save the column as a list and use it for running a glm eg
model.1 <- glm(x~ paste(list), family= poisson, link = logit).
Any help is highly appreciated. thanks in advance
If you have read your data in correctly (e.g. with
header=TRUEas specified in the comments above), you should end up with a 600+-column data frame (1 column for thexresponse, and a column for each predictor variable): I will call thismydatafor now. In that case as @TylerRinker suggests you could just include all the predictors:glm(x~.,data=mydata,family=poisson)(the logit link is the default link; if you want to specify it explicitly you can sayglm(x~.,data=mydata,family=poisson(link="logit")). You could then usestep, orstepAICfrom the MASS package.However, I have to add that unless you know what you’re doing, stepwise regression on 600 variables is a really, really, really BAD idea from a statistical point of view (Google something like “stepwise regression problems” or “stepwise regression Harrell”). I would strongly encourage you to take a look at something like the
glmnetpackage, which takes a more sensible approach to modeling with lots of predictors.