I have a data set with NAs sprinkled generously throughout. In addition it has

Question

0

Editorial Team

Asked: May 29, 20262026-05-29T09:25:37+00:00 2026-05-29T09:25:37+00:00

I have a data set with NAs sprinkled generously throughout. In addition it has

0

I have a data set with NAs sprinkled generously throughout.

In addition it has columns that need to be factors().

I am using the rfe() function from the caret package to select variables.

It seems the functions= argument in rfe() using lmFuncs works for the data with NAs but NOT on factor variables, while the rfFuncs works for factor variables but NOT NAs.

Any suggestions for dealing with this?

I tried model.matrix() but it seems to just cause more problems.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-29T09:25:38+00:00

Because of inconsistent behavior on these points between packages, not to mention the extra trickiness when going to more “meta” packages like caret, I always find it easier to deal with NAs and factor variables up front, before I do any machine learning.

For NAs, either omit or impute (median, knn, etc.).
For factor features, you were on the right track with model.matrix(). It will let you generate a series of “dummy” features for the different levels of the factor. The typical usage is something like this:

> dat = data.frame(x=factor(rep(1:3, each=5)))
> dat$x
 [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
Levels: 1 2 3
> model.matrix(~ x - 1, data=dat)
   x1 x2 x3
1   1  0  0
2   1  0  0
3   1  0  0
4   1  0  0
5   1  0  0
6   0  1  0
7   0  1  0
8   0  1  0
9   0  1  0
10  0  1  0
11  0  0  1
12  0  0  1
13  0  0  1
14  0  0  1
15  0  0  1
attr(,"assign")
[1] 1 1 1
attr(,"contrasts")
attr(,"contrasts")$x
[1] "contr.treatment"

Also, just in case you haven’t (although it sounds like you have), the caret vignettes on CRAN are very nice and touch on some of these points. http://cran.r-project.org/web/packages/caret/index.html

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a data set with NAs sprinkled generously throughout. In addition it has

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply