I am currently reading the Machine Learning book by Tom Mitchell. When talking about neural networks, Mitchell states:
“Although the perceptron rule finds a successful weight vector when
the training examples are linearly separable, it can fail to converge
if the examples are not linearly separable. “
I am having problems understanding what he means with “linearly separable”? Wikipedia tells me that “two sets of points in a two-dimensional space are linearly separable if they can be completely separated by a single line.”
But how does this apply to the training set for neural networks? How can inputs (or action units) be linearly separable or not?
I’m not the best at geometry and maths – could anybody explain it to me as though I were 5? 😉 Thanks!
Suppose you want to write an algorithm that decides, based on two parameters, size and price, if an house will sell in the same year it was put on sale or not. So you have 2 inputs, size and price, and one output, will sell or will not sell. Now, when you receive your training sets, it could happen that the output is not accumulated to make our prediction easy (Can you tell me, based on the first graph if
Xwill be an N or S? How about the second graph):Where:
As you can see in the first graph, you can’t really separate the two possible outputs (sold/not sold) by a straight line, no matter how you try there will always be both
SandNon the both sides of the line, which means that your algorithm will have a lot ofpossiblelines but no ultimate, correct line to split the 2 outputs (and of course to predict new ones, which is the goal from the very beginning). That’s whylinearly separable(the second graph) data sets are much easier to predict.