Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7753247
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 1, 20262026-06-01T12:00:37+00:00 2026-06-01T12:00:37+00:00

I am using Python’s multiprocessing module to process large numpy arrays in parallel. The

  • 0

I am using Python’s multiprocessing module to process large numpy arrays in parallel. The arrays are memory-mapped using numpy.load(mmap_mode='r') in the master process. After that, multiprocessing.Pool() forks the process (I presume).

Everything seems to work fine, except I am getting lines like:

AttributeError("'NoneType' object has no attribute 'tell'",)
  in `<bound method memmap.__del__ of
       memmap([ 0.57735026,  0.57735026,  0.57735026,  0.        ,  0.        ,        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,        0.        ,  0.        ], dtype=float32)>`
     ignored

in the unittest logs. The tests pass fine, nevertheless.

Any idea what’s going on there?

Using Python 2.7.2, OS X, NumPy 1.6.1.


UPDATE:

After some debugging, I hunted down the cause to a code path that was using a (small slice of) this memory-mapped numpy array as input to a Pool.imap call.

Apparently the “issue” is with the way multiprocessing.Pool.imap passes its input to the new processes: it uses pickle. This doesn’t work with mmaped numpy arrays, and something inside breaks which leads to the error.

I found this reply by Robert Kern which seems to address the same issue. He suggests creating a special code path for when the imap input comes from a memory-mapped array: memory-mapping the same array manually in the spawned process.

This would be so complicated and ugly that I’d rather live with the error and the extra memory copies. Is there any other way that would be lighter on modifying existing code?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-01T12:00:39+00:00Added an answer on June 1, 2026 at 12:00 pm

    My usual approach (if you can live with extra memory copies) is to do all IO in one process and then send things out to a pool of worker threads. To load a slice of a memmapped array into memory just do x = np.array(data[yourslice]) (data[yourslice].copy() doesn’t actually do this, which can lead to some confusion.).

    First off, let’s generate some test data:

    import numpy as np
    np.random.random(10000).tofile('data.dat')
    

    You can reproduce your errors with something like this:

    import numpy as np
    import multiprocessing
    
    def main():
        data = np.memmap('data.dat', dtype=np.float, mode='r')
        pool = multiprocessing.Pool()
        results = pool.imap(calculation, chunks(data))
        results = np.fromiter(results, dtype=np.float)
    
    def chunks(data, chunksize=100):
        """Overly-simple chunker..."""
        intervals = range(0, data.size, chunksize) + [None]
        for start, stop in zip(intervals[:-1], intervals[1:]):
            yield data[start:stop]
    
    def calculation(chunk):
        """Dummy calculation."""
        return chunk.mean() - chunk.std()
    
    if __name__ == '__main__':
        main()
    

    And if you just switch to yielding np.array(data[start:stop]) instead, you’ll fix the problem:

    import numpy as np
    import multiprocessing
    
    def main():
        data = np.memmap('data.dat', dtype=np.float, mode='r')
        pool = multiprocessing.Pool()
        results = pool.imap(calculation, chunks(data))
        results = np.fromiter(results, dtype=np.float)
    
    def chunks(data, chunksize=100):
        """Overly-simple chunker..."""
        intervals = range(0, data.size, chunksize) + [None]
        for start, stop in zip(intervals[:-1], intervals[1:]):
            yield np.array(data[start:stop])
    
    def calculation(chunk):
        """Dummy calculation."""
        return chunk.mean() - chunk.std()
    
    if __name__ == '__main__':
        main()
    

    Of course, this does make an extra in-memory copy of each chunk.

    In the long run, you’ll probably find that it’s easier to switch away from memmapped files and move to something like HDF. This especially true if your data is multidimensional. (I’d reccomend h5py, but pyTables is nice if your data is “table-like”.)

    Good luck, at any rate!

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Using python 2.4 and the built-in ZipFile library, I cannot read very large zip
Using Python module re, how to get the equivalent of the \w (which matches
Using python's optparse module I would like to add extra example lines below the
Im using Python's built in XML parser to load a 1.5 gig XML file
I'm using Python and its MySQLdb module to import some measurement data into a
Using Python, how can information such as CPU usage, memory usage (free, used, etc),
Using python in an interactive mode one imports a module then if the module
Using Python I want to be able to draw text at different angles using
Using Python, how would I go about reading in (be from a string, file
Using Python 2.6, is there a way to check if all the items of

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.