I’m looking for a fast svd library, in either c, c++ or java. Ultimately I’m using Java, but I’m very comfortable using jna to wrap c++, eg http://github.com/hughperkins/jeigen
I’m looking for a fast svd library that will handle sparse matrices. To keep this objective, so that the question doesn’t get marked as too subjective, let’s say:
- targeting use with news20.binary , eg from http://mldata.org/repository/data/viewslug/news20binary/
- how fast does it take to run?
- how much variance is conserved, eg for an S matrix of size 6 or 20?
I looked around at a few libraries and found:
- matlab: super fast, about 10 seconds, but it’s not really a ‘library’ as such. average squared projection error: 0.93
- redsvd: super fast, about 1 second to run, for 6 features, but the average squared projection error is 0.97, which is very high
- Eigen’s svd is both very slow, and only for dense matrices
- svdlibc: ran for 28 minutes before I stopped it; I guess it’s calculating the full S, rather than just the first 6 features or so
Basically, I’m looking for a library that gives about the same speed and average squared projection error as matlab, or at least, somewhat comparable.
From my experience, svdlibc is the best library of those options. I’ve dug a bit through its code before and I don’t believe it’s calculating the full S matrix (i.e., it is a true “thin svd”). If you can control the matrix representation on disk, svdlibc performs much faster when using the sparse binary input format due to the significantly lower I/O overhead.
The S-Space Package provided an executable jar around the SVDLIBJ java port of SVDLIBC. However, they found it had different results than SVDLIBC for certain input solutions.