After getting my testlabel and trainlabel, i implemented SVM on libsvm and i got an accuracy of 97.4359%. ( c= 1 and g = 0.00375)
model = svmtrain(TrainLabel, TrainVec, '-c 1 -g 0.00375');
[predict_label, accuracy, dec_values] = svmpredict(TestLabel, TestVec, model);
After i find the best c and g,
bestcv = 0;
for log2c = -1:3,
for log2g = -4:1,
cmd = ['-v 5 -c ', num2str(2^log2c), ' -g ', num2str(2^log2g)];
cv = svmtrain(TrainLabel,TrainVec, cmd);
if (cv >= bestcv),
bestcv = cv; bestc = 2^log2c; bestg = 2^log2g;
end
fprintf('%g %g %g (best c=%g, g=%g, rate=%g)\n', log2c, log2g, cv, bestc, bestg, bestcv);
end
end
c = 8 and g = 0.125
I implement the model again:
model = svmtrain(TrainLabel, TrainVec, '-c 8 -g 0.125');
[predict_label, accuracy, dec_values] = svmpredict(TestLabel, TestVec, model);
I get an accuracy of 82.0513%
How is it possible for the accuracy to decrease? shouldn’t it increase? Or am i making any mistake?
The accuracies that you were getting during parameter tuning are biased upwards because you were predicting the same data that you were training. This is often fine for parameter tuning.
However, if you wanted those accuracies to be accurate estimates of the true generalization error on your final test set, then you have to add an additional wrap of cross validation or other resampling scheme.
Here is a very clear paper that outlines the general issue (but in a similar context of feature selection): http://www.pnas.org/content/99/10/6562.abstract
EDIT:
I usually add cross validation like: