I am trying to run some classification using weka from the command line. In this link weka-Primer-commandline there is the following example for creating test and training sets:
java weka.filters.supervised.instance.StratifiedRemoveFolds -i data/soybean.arff -o soybean-train.arff \
-c last -N 4 -F 1 -V
java weka.filters.supervised.instance.StratifiedRemoveFolds -i data/soybean.arff -o soybean-test.arff \
-c last -N 4 -F 1
Is this piece of code supposed to make 3/4 of data for training and 1/4 for test set? To me seems that only one fold of the whole data(from the last because of -V) will be assigned for training and only 1 fold for test. Am I right? I need 3/4 data for training and 1/4 for test.
The code in the documentation is correct and puts 3/4 of the data in the train set and 1/4 in the test set. The options mean the following:
-N: There should be 4 folds, meaning that the data is split into 4 equal non-overlapping parts-F: The first of these parts/folds should be selected-V: Inverse the selectionSo the first line which creates the train set, works as follows:
The test is simply created by only selecting the first fold, which is 1/4 of the data.