I trained a random forest:
model <- randomForest(x, y, proximity=TRUE)
When I want to predict y for new objects, I use
y_pred <- predict(model, xnew)
How can I calculate the proximity between the new objects (xnew) and the training set (x) based on the already existing forest (model)?
The proximity option in the predict function gives only the proxmities among the new objects (xnew). I could run randomForest unsupervised again on a combined data set (x and xnew) to get the proximities, but I think there must be some way to avoid building the forest again and instead using the already existing one.
Thanks!
Kilian
I believe what you want is to specify your test observations in the
randomForestcall itself, something like this:So that gives you the proximity from the ten test cases to all 150.
The only other option would be to call
predicton your new caserbinded to the original training cases, I think. But that way you don’t need to have your test cases up front with therandomForestcall.In that case, you’ll want to use
keep.forest = TRUEin therandomForestcall and of course setproximity = TRUEwhen you callpredict.