What is the best way to efficiently remove columns from a sparse matrix that only contain zeros. I have a matrix which I have created and filled with data:
matrix = sp.sparse.lil_matrix((100, 100))
I now wish to remove ~ the last 20 columns which only contain zero data. How can I do this?
If this were just a numpy array,
X, then you could sayX!=0which would give you a boolean array of the same shape asX, and then you could indexXwith the boolean array, i.e.non_zero_entries = X[X!=0]But this is a sparse matrix which does not support boolean indexing and also will not give you what you want if you try
X!=0— it just returns a single boolean value that seems to only return true if they are the exact same matrix (in memory).What you want is the
nonzeromethod from numpy.If you want only the full columns where there are non-zero entries, then just take the 1st from indices. Except you need to account for the repeated indices (if there are more than one entries in a column):