Could someone please tell me whether the training sample sizes for each class need to be equal?
Can I take this scenario?
class1 class2 class3
samples 400 500 300
or should all the classes have equal sample sizes?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
The KNN results basically depend on 3 things (except for the value of N):
Consider the following example where you’re trying to learn a donut-like shape in a 2D space.
By having a different density in your training data (let’s say you have more training samples inside of the donut than outside), your decision boundary will be biased like below:
On the other hand, if your classes are relatively balanced, you’ll get a much finer decision boundary that will be close to the actual shape of the donut:
So basically, I would advise trying to balance your dataset (just normalize it somehow), and also take in consideration the 2 other items I mentionned above, and you should be fine.
In case you have to deal with inbalanced training data, you could also consider using the WKNN algorithm (just an optimization of KNN) to assign stronger weights to your class that has less elements.