I am using Weka GUI to run a NaiveBayes classifier on an online post.

Question

0

Asked: June 12, 20262026-06-12T04:27:40+00:00 2026-06-12T04:27:40+00:00

I am using Weka GUI to run a NaiveBayes classifier on an online post.

0

I am using Weka GUI to run a NaiveBayes classifier on an online post. I am trying to track the instances (online posts) that are incorrectly predicted so that I can learn further how I can improve the features.

Currently, I have a work around to do that: I generate the data with unique ID included, and when I import to Weka I remove the uniqueID. I then attach the prediction appender, which saves prediction results to an .arff file. I read through the file to find instances with bad performance. For incorrectly classified instances, I use certain feature values that give unique enough value for each instance and find the instance with the same value from my original data, which contains the unique ID. As you can see, this is a truly time consuming process.

I would love to hear if there is a way to ignore a feature, which in my case is the unique ID of an instance, while keeping it as part of the data when running the classifier.

Thank you.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-12T04:27:42+00:00

I’m not sure if weka GUI has a direct option for that. However you can achieve the same through commandline

java weka.classifiers.meta.FilteredClassifier -F weka.filters.unsupervised.attribute.RemoveType -W weka.classifiers.trees.RandomForest -t G:\pub-resampled-0.5.arff -T G:\test.csv.arff -p 1 -distribution > G:\out.txt

In the above example, first attribute is an an identifier (string). RemoveType filter will remove all string fields while building the model. However, you can still ask weka to include that identifier as part of the output (predictions) by passing as argument to -p. In my case first attribute (partner_id) is identifier so it gets listed in the output along with predictions. (-distribution option is to output prediction scores for all class labels). You can get more details from http://weka.wikispaces.com/Instance+ID

=== Predictions on test data ===

 inst#     actual  predicted error distribution (partner_id)
     1        1:?        2:0       0,*1 (8i7t3)
     2        1:?        2:0       0,*1 (8i7u1)
     3        1:?        2:0       0,*1 (8i7um)
     4        1:?        2:0       0.1,*0.9 (8i7ux)
     5        1:?        2:0       0,*1 (8i7va)
     6        1:?        2:0       0,*1 (8i7vb)
     7        1:?        2:0       0,*1 (8i7vf)

Hope you find this helpful..

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am using Weka GUI to run a NaiveBayes classifier on an online post.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply