I have a 2-class dataset on which I should apply a binary classification algorithm.

Question

0

Editorial Team

Asked: June 7, 20262026-06-07T00:56:27+00:00 2026-06-07T00:56:27+00:00

I have a 2-class dataset on which I should apply a binary classification algorithm.

0

I have a 2-class dataset on which I should apply a binary classification algorithm. The dataset looks like as follows:

a1, a2, a3, …… +1
……
b1, b2, b3, …….-1
…….

where each feature/attribute value is a 2-tuple. For example, a1 is (a1_1, a1_2). There is dependency between a1_1 and a1_2 (though at this point I am not sure how are related) and their order is not important. Similar case holds for negative class instances too.

I am looking for some way of classifying these instances. Please let me know if such an algorithm exists.

To start with, I tried splitting the tuples – a1_1 and a1_2 formed two separate columns for an instance, leading to twice the number of feature values per instance – and used LIBSVM (C/C++) library, but the results were not good. I suppose it is not meaningful to split the tuples and hence a search for a suitable method.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-07T00:56:30+00:00

All things being equal, I’d imagine that if your data actually contains pairs of things, then it would be useful to communicate that fact to the learning algorithm. Splitting your monolithic pairs into separate features gives your classification algorithm a chance to learn about any useful relationships that might exist between the two features.

That’s just a general rule of thumb, however. There are several reasons that you might not be getting good classification results:

It’s possible that there is no useful relationship between the two features of a pair. If that’s the case, then splitting into two features has made your problem a lot harder: now the classification algorithm has an additional dimension to explore.
Maybe you haven’t found the right learning algorithm. Different algorithms have different strengths, and it’s possible that using multiple features is a fine idea provided that you use the proper classification algorithm. I’d suggest trying a supervised learning package like Weka, which provides a really easy way to compare a bunch of learning algorithms on a single problem. Just convert your data into .arff format and you’ll be classifying using SVNs, decision trees, neural networks, etc. in no time.
You might not be providing enough features. When you split a1, a2, ... into separate features [a1_1, a1_2], [a2_1, a2_2], ..., you could also include the reversed data as well, e.g. [a1_1, a1_2], [a1_2, a1_1], [a2_1, a2_2], [a2_2, a2_1], ... For that matter, you could also use three features by including the non-split version for each datum, e.g. [a1, a1_1, a1_2], [a2, a2_1, a2_2], ... That’s kind of a “let’s throw every feature we can think of at the problem” approach.
It could be that you just have a very hard classification problem. Do you have any evidence that there is actually some signal in your input that a classifier could use to divide data into two groups?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a 2-class dataset on which I should apply a binary classification algorithm.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply