I’m currently trying to build an LDA model on a dataset which contains some missing (NA) values. I want to, for example, impute the mean for NA values. From what I understand, I can set na.action=na.omit in the lda and predict functions which will remove the observations when building the model, and force return of NA when making predictions.
my.dat <- as.data.frame(cbind(
c(0, 1, 0, 1, 1, 0),
c(5, 8, 9, 1, -1, NA),
c(-2.4, -4.0, -4.4, -0.5, 0.7, -0.3)
))
mod <- lda(my.dat[,-1], my.dat[,1], na.action=na.omit)
predict(mod, my.dat[,-1], na.action=na.omit)
But I want now to impute the means where I have an NA value. So, I can define my own na.impute function. But, I cannot understand what is passed to this function, and what I need to return.
na.impute <- function (object) {
print(object)
object
}
which gives me output:
[1] g x
<0 rows> (or 0-length row.names)
which doesn’t make much sense to me. I cannot find any guidance in the documentation. What exactly is object, and how am I supposed to manipulate it to overwrite NA values?
Here is the first way how to find out what is
object:So it really is an unusual object:
structure(list(g = grouping, x = x), class = "data.frame"). Another way to see this, let us inspect functionlda:In this case we are interested in
lda.data.frame. Since it is asterisked we have to use eitherMASS:::lda.data.frameorgetAnywhere("lda.data.frame")to see the source code:Now we can see that
lda.matrixis needed, so again using one of two functions:And finally here we find a call of
na.actionwhich is what we expected. Now this is a function which replacesNAvalues with column means:Now considering
predictandna.actionit is unavailable option: seegetAnywhere("predict.lda"), there is no usage of this argument.