I’m using the explorer feature of Weka for classification.
So I have my .arff file, with 2 features of NUMERIC value, and my class is a binary 0 or 1 (eg {0,1}).
Sample:
@RELATION summary
@ATTRIBUTE feature1 NUMERIC
@ATTRIBUTE feature2 NUMERIC
@ATTRIBUTE class {1,0}
@DATA
23,11,0
20,100,1
2,36,0
98,8,1
.....
I load this .arff file, use 10-fold cross validation (no test file), and choose NaiveBayes, then I classify the data, and it gives me: 5 incorrectly labeled, 100 correctly labeled. So far so good.
Now, I significantly change my .arff file (give completely random values for my feature attributes). And repeat the above, and I get the EXACT same statistics when classifying.
I tried this with more changes to my .arff file, different classification algorithms. Still, EXACT same statistic (within the same algorithm) no matter what values I give to my .arff file.
Am I doing something wrong here?
It’s hard to tell without more information, but I have two suggestions:
What are the relative proportions of the two classes? Is it 5 to 100? Lots of algorithms don’t work well with highly skewed class label distributions.
Just a hunch, but try changing your class labels from numbers to strings (e.g. ‘class1’ and ‘class2’). Weka calls these ‘nominal’ attributes so maybe using numbers is not allowed.