I am new to ML and am working on a kaggle competition to learn a bit. When I add certain features to my dataset, the accuracy decreases.
Why isn’t the feature that adds to the cost just weighted to zero (ignored)? Is it because non-linear features can cause the a local-minimum solution?
Thanks.
If you’re talking about training error for a linear regression classifier, then adding features will always decrease your error unless you have a bug. Like you say, it’s a convex problem and the global solution can never be worse as you can just set the weight to zero.
If you’re talking about test error however, then overfitting is going to be the big issue with adding features, and is certainly something you would observe.