I am using scikit-learn SVC to classify some data. I would like to increase the training performance.
clf = svm.SVC(cache_size=4000, probability=True, verbose=True)
Since sckikit-learn interfaces with libsvm and libsvm uses OpenMp I was hoping that:
export OMP_NUM_THREADS=16
would run on multiple cores.
Unfortunately this did not help.
Any Ideas?
Thanks
There is no OpenMP support in the current binding for libsvm in scikit-learn. However it is very likely that if you have performance issues with
sklearn.svm.SVCshould you use a more scalable model instead.If your data is high dimensional it might be linearly separable. In that case it is advised to first try simpler models such as naive bayes models or
sklearn.linear_model.Perceptronthat are known to be very speedy to train. You can also trysklearn.linear_model.LogisticRegressionandsklearn.svm.LinearSVCboth implemented usingliblinearthat is more scalable thanlibsvmalbeit less memory efficients than other linear models in scikit-learn.If your data is not linearly separable, you can try
sklearn.ensemble.ExtraTreesClassifier(adjust then_estimatorsparameter to trade-off training speed vs. predictive accuracy).Alternatively you can try to approximate a RBF kernel using the
RBFSamplertransformer of scikit-learn + fitting a linear model on the output:http://scikit-learn.org/dev/modules/kernel_approximation.html