I need to hold a 50,000×50,000 sparse matrix/2d-array, with ~5% of the cells, uniformly

Question

0

Asked: June 15, 20262026-06-15T15:03:46+00:00 2026-06-15T15:03:46+00:00

I need to hold a 50,000×50,000 sparse matrix/2d-array, with ~5% of the cells, uniformly

0

I need to hold a 50,000×50,000 sparse matrix/2d-array, with ~5% of the cells, uniformly distributed, being non-empty. I will need to:

edit I need to do this in numpy/scipy, sorry if wasn’t clear. Also, added requirements.

Read the 5% non-empty data from a DB, and assign it to matrix/2d-array cells, as quickly as possible.
Use as little memory as possible.
Use fancy indexing (take the indexes of and all non-empty values in a column, say). This is nice-to-have, memory and construction-time as more important.
Once constructed, the matrix will not change.
I will, however, want to take its transpose, with preferably O(1) memory and time.

What’s the most efficient way of achieving this?
Can I hold nan’s instead of zeros to indicate “empty” cells? (0 is a valid value for me), and can I efficiently run nansum, nanmean?
If not, can I efficiently take the index of and values of all non-zeros in a given column/row?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-15T15:03:47+00:00

Well, for my purposes it seems like csc is the way to go. With 5% “sparsity factor”, the memory that the row indexes in csc take is still worth it. Here’s the code I used to test that the stuff I need really is fast:

def build_csc(N, SPARSITY_FACTOR):

    data = []
    row_indexes = []
    column_indexes = [0] * (N+1)

    current_index = 0
    for j in xrange(N):
        column_indexes[j] = current_index
        for i in xrange(N):
            if random.random() < SPARSITY_FACTOR:
                row_indexes.append(i)
                data.append(random.random())
                current_index += 1
    column_indexes[N] = current_index

    return sp.csc_matrix((data,row_indexes,column_indexes), shape=(N,N), dtype=np.float)


def take_from_col(m, col_index):
    col = m[:,col_index]
    indexes = col.nonzero()[0]
    values = col[indexes]

Running this in %timeit shows that this is indeed fast.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need to hold a 50,000×50,000 sparse matrix/2d-array, with ~5% of the cells, uniformly

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply