I have a set of sparse matrices filled with boolean values that I need to perform logical operations on (mostly element-wise OR).
as in numpy, summing matrices with dtype=’bool’ gives the element-wise OR, however there’s a nasty side-effect:
>>> from scipy import sparse
>>> [a,b] = [sparse.rand(5,5,density=0.1,format='lil').astype('bool')
... for x in range(2)]
>>> b
<5x5 sparse matrix of type '<class 'numpy.bool_'>'
with 2 stored elements in LInked List format>
>>> a+b
<5x5 sparse matrix of type '<class 'numpy.int8'>'
with 4 stored elements in Compressed Sparse Row format>
The data type gets changed to ‘int8’, which causes problems for future operations. This could be gotten around with by saying:
(a+b).astype('bool')
But I get the impression that all this type changing would cause a performance hit.
Why is the dtype of the result different from the operands?
And is there a better way to do logical operations on sparse matrices in python?
Logical operations are not supported for sparse matrices, but converting back to a ‘bool’ is not all that expensive. Actually, if using LIL format matrices, the conversion may appear to take negative time due to performance fluctuations:
You may have noticed that your LIL matrices were converted to CSR format before adding them together, look at the return format. If you had already been using CSR format to begin with, then the conversion overhead becomes more noticeable:
CSR (and CSC) matrices have a
dataattribute which is a 1D array that holds the actual non-zero entries of the sparse matrix, so the cost of recasting your sparse matrix will depend on the number of non-zero entries of your matrix, not its size: