I am developing a simple recommendation system and trying to do some computation like SVD, RBM, etc.
To be more convincing, I am going to use the Movielens or Netflix dataset to evaluate the performance of the system. However, the two datasets both have more than 1 million of users and more than 10 thousand of items, it’s impossible to put all the data into memory. I have to use some specific modules to handle such a large matrix.
I know there are some tools in SciPy can handle this, and divisi2 used by python-recsys also seems like a good choice. Or maybe there are some better tools I don’t know?
Which module should I use? Any suggestion?
I would suggest SciPy, specifically Sparse. As Dougal pointed out, Numpy is not suited for this situation.