In py2.6+, the multiprocessing module offers a Pool class, so one can do:
class Volatile(object):
def do_stuff(self, ...):
pool = multiprocessing.Pool()
return pool.imap(...)
However, with the standard Python implementation at 2.7.2, this approach soon leads to “IOError: [Errno 24] Too many open files”. Apparently the pool object never gets garbage collected, so its processes never terminate, accumulating whatever descriptors are opened internally. I think this because the following works:
class Volatile(object):
def do_stuff(self, ...):
pool = multiprocessing.Pool()
result = pool.map(...)
pool.terminate()
return result
I would like to keep the “lazy” iterator approach of imap; how does the garbage collector work in that case? How to fix the code?
In the end, I ended up passing the
poolreference around and terminating it manually once thepool.imapiterator was finished:In case anyone stumbles upon this solution in the future: the chunksize parameter is very important in
Pool.imap(as opposed to plainPool.map, where it didn’t matter). I manually set it so that each process receives1 + len(input) / len(pool)jobs. Leaving it to the defaultchunksize=1gave me the same performance as if I didn’t use parallel processing at all… bad.I guess there’s no real benefit to using ordered
imapvs. orderedmap, I just personally like iterators better.