Does it help in classifying better if I add linear, non-linear combinatinos of the existing features ? For example does it help to add mean, variance as new features computed from the existing features ? I believe that it definitely depends on the classification algorithm as in the case of PCA, the algorithm by itself generates new features which are orthogonal to each other and are linear combinations of the input features. But how does it effect in the case of decision tree based classifiers or others ?
Does it help in classifying better if I add linear, non-linear combinatinos of the
Share
Yes, combination of existing features can give new features and help for classification. Moreover, combination of the feature with itself (e.g. polynomial from the feature) can be used as this additional data to be used during classification.
As an example, consider logistic regression classifier with such linear formula as its core:
Imagine, that you have 2 observations:
In both cases
g()will be equal to 8. If observations belong to different classes, you have no possibility to distinguish them. But let’s add one more variable (feature)z, which is combination of the previous 2 features –z = x * y:Now for same observations we have:
So now we get 2 different points and can distinguish between 2 observations.
Polynomial features (x^2, x^3, y^2, etc.) do not give additional points, but instead change the graph of the function. For example,
g(x) = a0 + a1*xis a line, whileg(x) = a0 + a1*x + a2*x^2is parabola and thus can fit data much more closely.