Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9121371
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 17, 20262026-06-17T05:50:50+00:00 2026-06-17T05:50:50+00:00

I’m working with a very large sparse matrix multiplication (matmul) problem. As an example

  • 0

I’m working with a very large sparse matrix multiplication (matmul) problem. As an example let’s say:

  • A is a binary ( 75 x 200,000 ) matrix. It’s sparse, so I’m using csc for storage. I need to do the following matmul operation:

  • B = A.transpose() * A

  • The output is going to be a sparse and symmetric matrix of size 200Kx200K.

Unfortunately, B is going to be way to large to store in RAM (or “in core”) on my laptop. On the other hand, I’m lucky because there are some properties to B that should solve this problem.

Since B is going to be symmetric along the diagonal and sparse, I could use a triangular matrix (upper/lower) to store the results of the matmul operation and a sparse matrix storage format could further reduce the size.

My question is…can numpy or scipy be told, ahead of time, what the output storage requirements are going to look like so that I can select a storage solution using numpy and avoid the “matrix is too big” runtime error after several minutes (hours) of calculation?

In other words, can storage requirements for the matrix multiply be approximated by analyzing the contents of the two input matrices using an approximate counting algorithm?

  • https://en.wikipedia.org/wiki/Approximate_counting_algorithm

If not, I’m looking into a brute force solution. Something involving map/reduce, out-of-core storage, or a matmul subdivision solution (strassens algorithm) from the following web links:

A couple Map/Reduce problem subdivision solutions

  • http://www.norstad.org/matrix-multiply/index.html
  • http://bpgergo.blogspot.com/2011/08/matrix-multiplication-in-python.html

A out-of-core (PyTables) storage solution

  • Very large matrices using Python and NumPy

A matmul subdivision solution:

  • https://en.wikipedia.org/wiki/Strassen_algorithm
  • http://facultyfp.salisbury.edu/taanastasio/COSC490/Fall03/Lectures/FoxMM/example.pdf
  • http://eli.thegreenplace.net/2012/01/16/python-parallelizing-cpu-bound-tasks-with-multiprocessing/

Thanks in advance for any recommendations, comments, or guidance!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-17T05:50:51+00:00Added an answer on June 17, 2026 at 5:50 am

    Since you are after the product of a matrix with its transpose, the value at [m, n] is basically going to be the dot product of columns m and n in your original matrix.

    I am going to use the following matrix as a toy example

    a = np.array([[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1],
                  [0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0],
                  [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1]])
    >>> np.dot(a.T, a)
    array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
           [0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0],
           [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1],
           [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
           [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1],
           [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
           [0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0],
           [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
           [0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0],
           [0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 2]])
    

    It is of shape (3, 12) and has 7 non-zero entries. The product of its transpose with it is of course of shape (12, 12) and has 16 non-zero entries, 6 of it in the diagonal, so it only requires storage of 11 elements.

    You can get a good idea of what the size of your output matrix is going to be in one of two ways:

    CSR FORMAT

    If your original matrix has C non-zero columns, your new matrix will have at most C**2 non-zero entries, of which C are in the diagonal, and are assured not to be zero, and of the remaining entries you only need to keep half, so that is at most (C**2 + C) / 2 non-zero elements. Of course, many of these will also be zero, so this is probably a gross overestimate.

    If your matrix is stored in csr format, then the indices attribute of the corresponding scipy object has an array with the column indices of all non zero elements, so you can easily compute the above estimate as:

    >>> a_csr = scipy.sparse.csr_matrix(a)
    >>> a_csr.indices
    array([ 2, 11,  1,  7, 10,  4, 11])
    >>> np.unique(a_csr.indices).shape[0]
    6
    

    So there are 6 columns with non-zero entries, and so the estimate would be for at most 36 non-zero entries, way more than the real 16.

    CSC FORMAT

    If instead of column indices of non-zero elements we have row indices, we can actually do a better estimate. For the dot product of two columns to be non-zero, they must have a non-zero element in the same row. If there are R non-zero elements in a given row, they will contribute R**2 non-zero elements to the product. When you sum this for all rows, you are bound to count some elements more than once, so this is also an upper bound.

    The row indices of the non-zero elements of your matrix are in the indices attribute of a sparse csc matrix, so this estimate can be computed as follows:

    >>> a_csc = scipy.sparse.csc_matrix(a)
    >>> a_csc.indices
    array([1, 0, 2, 1, 1, 0, 2])
    >>> rows, where = np.unique(a_csc.indices, return_inverse=True)
    >>> where = np.bincount(where)
    >>> rows
    array([0, 1, 2])
    >>> where
    array([2, 3, 2])
    >>> np.sum(where**2)
    17
    

    This is darn close to the real 16! And it is actually not a coincidence that this estimate is actually the same as:

    >>> np.sum(np.dot(a.T,a),axis=None)
    17
    

    In any case, the following code should allow you to see that the estimation is pretty good:

    def estimate(a) :
        a_csc = scipy.sparse.csc_matrix(a)
        _, where = np.unique(a_csc.indices, return_inverse=True)
        where = np.bincount(where)
        return np.sum(where**2)
    
    def test(shape=(10,1000), count=100) :
        a = np.zeros(np.prod(shape), dtype=int)
        a[np.random.randint(np.prod(shape), size=count)] = 1
        print 'a non-zero = {0}'.format(np.sum(a))
        a = a.reshape(shape)
        print 'a.T * a non-zero = {0}'.format(np.flatnonzero(np.dot(a.T,
                                                                    a)).shape[0])
        print 'csc estimate = {0}'.format(estimate(a))
    
    >>> test(count=100)
    a non-zero = 100
    a.T * a non-zero = 1065
    csc estimate = 1072
    >>> test(count=200)
    a non-zero = 199
    a.T * a non-zero = 4056
    csc estimate = 4079
    >>> test(count=50)
    a non-zero = 50
    a.T * a non-zero = 293
    csc estimate = 294
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Let's say I'm outputting a post title and in our database, it's Hello Y’all
I have a .ini file as follows: [playlist] numberofentries=2 File1=http://87.230.82.17:80 Title1=(#1 - 365/1400) Example
link Im having trouble converting the html entites into html characters, (&# 8217;) i
I'm parsing an RSS feed that has an ’ in it. SimpleXML turns this
I've tracked down a weird MySQL problem to the two different ways I was
I'm trying to convert HTML to plain text. I get many &\#8217; &\#8220; etc.
I have been unable to fix a problem with Java Unicode and encoding. The
i got an object with contents of html markup in it, for example: string
I'm working with an upstream system that sometimes sends me text destined for HTML/XML
I ran into a problem. Wrote the following code snippet: teksti = teksti.Trim() teksti

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.