I tried to use LSI to generate vectors to represent documents. I am using the svd package in Scipy library. But the program throws a memory error. The size of my matrix is 100*13057. Is this too big for my 8G RAM?
I searched this problem in stackflow. Somebody said I just have to install 64-bit Python on my 64-bit OS. (Now, I have 32-bit Python on 64-bit OS). But re-installing all libraries is too trivial. Another opinion is to convert sparse matrix.
So does everyone have idea on this problem? Thanks!
raw_matrix = []
for text in forest_lsi:
raw_matrix.append( text.get_vector() )
from svd import compute_svd
print("The size of raw matrix: "+str(len(raw_matrix))+" * "+str(len(raw_matrix[0])))
matrix = compute_svd( raw_matrix )
The message in Concole is as bellow:
The size of raw matrix: 100 * 13057
Original matrix:
[[1 1 2 ..., 0 0 0]
[0 3 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
...,
[0 0 0 ..., 0 0 0]
[0 0 1 ..., 0 0 0]
[0 0 2 ..., 1 1 3]]
Traceback (most recent call last):
File "D:\workspace\PyQuEST\src\Practice\baseline_lsi.py", line 93, in <module>
matrix = compute_svd( raw_matrix )
File "D:\workspace\PyQuEST\src\Practice\svd.py", line 12, in compute_svd
U, s, V = linalg.svd( matrix )
File "D:\Program\Python26\lib\site-packages\scipy\linalg\decomp_svd.py", line 79, in svd
full_matrices=full_matrices, overwrite_a = overwrite_a)
MemoryError
Your
Vmatrix will take13057*13057*8bytes of memory if you’re using the defaultdtype=np.float, which is approx. 1.4GB. My hunch is that that’s too large for your 32-bit Python. Try using 32-bit floating point numbers, that isdtype=np.float32, to cut memory use in half, or start usingscipy.sparse(almost always a good idea for information retrieval problems).