I’m cross-posting this from math.stackexchange.com because I’m not getting any feedback and it’s a time-sensitive question for me.
My question pertains to linear separability with hyperplanes in a support vector machine.
According to Wikipedia:
…formally, a support vector machine
constructs a hyperplane or set of
hyperplanes in a high or infinite
dimensional space, which can be used
for classification, regression or
other tasks. Intuitively, a good
separation is achieved by the
hyperplane that has the largest
distance to the nearest training data
points of any class (so-called
functional margin), since in general
the larger the margin the lower the
generalization error of the
classifier.classifier.
The linear separation of classes by hyperplanes intuitively makes sense to me. And I think I understand linear separability for two-dimensional geometry. However, I’m implementing an SVM using a popular SVM library (libSVM) and when messing around with the numbers, I fail to understand how an SVM can create a curve between classes, or enclose central points in category 1 within a circular curve when surrounded by points in category 2 if a hyperplane in an n-dimensional space V is a “flat” subset of dimension n − 1, or for two-dimensional space – a 1D line.
Here is what I mean:

That’s not a hyperplane. That’s circular. How does this work? Or are there more dimensions inside the SVM than the two-dimensional 2D input features?
This example application can be downloaded here.
Edit:
Thanks for your comprehensive answers. So the SVM can separate weird data well by using a kernel function. Would it help to linearize the data before sending it to the SVM? For example, one of my input features (a numeric value) has a turning point (eg. 0) where it neatly fits into category 1, but above and below zero it fits into category 2. Now, because I know this, would it help classification to send the absolute value of this feature for the SVM?
As mokus explained, support vector machines use a kernel function to implicitly map data into a feature space where they are linearly separable:
Different kernel functions are used for various kinds of data. Note that an extra dimension (feature) is added by the transformation in the picture, although this feature is never materialized in memory.
(Illustration from Chris Thornton, U. Sussex.)