I am trying to create a loop that will perform a series of analyses. I am using geeglm from geepack, which fails if there are any null values. Creating a subset solves this, but do not seem to be able to set the subset dynamically based on a changing variable.
while (j <= y.num) {
strSubset = as.character(df.IV$IV[j]) #Gives column name in quotes
df.data.sub = subset(df.data, strSubset>=0)
#subset dataset is not created
# analyses on subset take place
j = j + 1
}
If I type the variable name in the formula it works, so I assume that I am not creating the variable in a manner that allows it to be evaluated in the subset function. Any help would be greatly appreciated!
Reproducible example:
# dataset
age<-18:29
height<-58:69
df.ex=data.frame(age=age,height=height)
df.ex[4,1]<-NA
# dataset of columns that will be used for analysis
values<-c("age", "height")
df.variables=data.frame(values)
# Age column has a null (NA) value. The row must be removed for the analysis to run
# explicit creation
df.ex.sub.explicit<-subset(df.ex, age >= 0)
dim(df.ex.sub.explicit) #11 obs of 2 variables
i=1
strFilter=as.character(df.variables$values[i])
df.ex.sub.passvar<-subset(df.ex,strFilter>=0)
dim(df.ex.sub.explicit) #12 obs of 2 variables
I would suggest:
It’s a little easier to store this list of variables as a character vector, unless you need the variables to be coupled with other information about the variables …
subsetand$are great for interactive use, but for programming it is probably best to use “machine-style” indexing with[and[[… also, you need to useis.na()to test forNAvalues.subset()has a quirk in that it will drop values for which the result of the test is eitherFALSEorNA, but it is probably clearer to use the explicit test.