I have a 2-class dataset on which I should apply a binary classification algorithm. The dataset looks like as follows:
a1, a2, a3, …… +1
……
b1, b2, b3, …….-1
…….
where each feature/attribute value is a 2-tuple. For example, a1 is (a1_1, a1_2). There is dependency between a1_1 and a1_2 (though at this point I am not sure how are related) and their order is not important. Similar case holds for negative class instances too.
I am looking for some way of classifying these instances. Please let me know if such an algorithm exists.
To start with, I tried splitting the tuples – a1_1 and a1_2 formed two separate columns for an instance, leading to twice the number of feature values per instance – and used LIBSVM (C/C++) library, but the results were not good. I suppose it is not meaningful to split the tuples and hence a search for a suitable method.
All things being equal, I’d imagine that if your data actually contains pairs of things, then it would be useful to communicate that fact to the learning algorithm. Splitting your monolithic pairs into separate features gives your classification algorithm a chance to learn about any useful relationships that might exist between the two features.
That’s just a general rule of thumb, however. There are several reasons that you might not be getting good classification results:
.arffformat and you’ll be classifying using SVNs, decision trees, neural networks, etc. in no time.a1, a2, ...into separate features[a1_1, a1_2], [a2_1, a2_2], ..., you could also include the reversed data as well, e.g.[a1_1, a1_2], [a1_2, a1_1], [a2_1, a2_2], [a2_2, a2_1], ...For that matter, you could also use three features by including the non-split version for each datum, e.g.[a1, a1_1, a1_2], [a2, a2_1, a2_2], ...That’s kind of a “let’s throw every feature we can think of at the problem” approach.