I have two scripts which both generate random forests in R, which as far as I can work out have the same inputs, although my problem suggests this isn’t the case. One of them returns an importance table containing
row.names importance.blue importance.red importance.MeanDecreaseAccuracy importance.MeanDecreaseGini
the other importance table just contains
row.names MeanDecreaseGini
Whats the difference between these two forests, and more importantly what’s causing the difference given what I thought were identical inputs?
(The scripts are too large to paste here, but both are trying to predict a factor on the basis of a bunch of continuous variables)
The help page of randomForest tells us, that importance (when used for classification) is a matrix with nclass + 2 columns. The first nclass columns are the class-specific measures computed as mean descrease in accuracy. The nclass + 1st column is the mean descrease in accuracy over all classes. The last column is the mean decrease in Gini index.
If importance=FALSE, the last measure is still returned as a vector.
So, it seems to me, that you called randomForest once with importance=TRUE and once with importance=FALSE.