I am using Weka GUI to run a NaiveBayes classifier on an online post. I am trying to track the instances (online posts) that are incorrectly predicted so that I can learn further how I can improve the features.
Currently, I have a work around to do that: I generate the data with unique ID included, and when I import to Weka I remove the uniqueID. I then attach the prediction appender, which saves prediction results to an .arff file. I read through the file to find instances with bad performance. For incorrectly classified instances, I use certain feature values that give unique enough value for each instance and find the instance with the same value from my original data, which contains the unique ID. As you can see, this is a truly time consuming process.
I would love to hear if there is a way to ignore a feature, which in my case is the unique ID of an instance, while keeping it as part of the data when running the classifier.
Thank you.
I’m not sure if weka GUI has a direct option for that. However you can achieve the same through commandline
In the above example, first attribute is an an identifier (string). RemoveType filter will remove all string fields while building the model. However, you can still ask weka to include that identifier as part of the output (predictions) by passing as argument to -p. In my case first attribute (partner_id) is identifier so it gets listed in the output along with predictions. (-distribution option is to output prediction scores for all class labels). You can get more details from http://weka.wikispaces.com/Instance+ID
Hope you find this helpful..