Currently, I’m copying one instance at a time from one dataset to the other. Is there a way to do this so that string mappings remain intact? The mergeInstances works horizontally, is there an equivalent vertical merge?
This is one step of a loop I use to read datasets of the same structure from multiple arff files into one large dataset. There has got to be a simpler way.
Instances iNew = new ConverterUtils.DataSource(name).getDataSet();
for (int i = 0; i < iNew.numInstances(); i++) {
Instance nInst = iNew.instance(i);
inst.add(nInst);
}
Why not make a new ARFF file which has the data from both of the originals? A simple
where
20is replaced by however many lines long your arff header is. This would then produce a new arff file with all of the desired instances, and you could read this new file with your existing code:You could also invoke weka on the command line using this documentation: http://old.nabble.com/how-to-merge-two-data-file-a.arff-and-b.arff-into-one-data-list–td22890856.html
However, there is no function in the documentation http://weka.sourceforge.net/doc.dev/weka/core/Instances.html#main%28java.lang.String which will allow you to append multiple arff files natively within your java code. As of Weka 3.7.6, the code that appends two arff files is this:
Thus it looks like Weka itself simply iterates through all of the instances in a data set and prints them, the same process your code uses.