The piece of code that I have looks some what like this:
glbl_array = # a 3 Gb array
def my_func( args, def_param = glbl_array):
#do stuff on args and def_param
if __name__ == '__main__':
pool = Pool(processes=4)
pool.map(my_func, range(1000))
Is there a way to make sure (or encourage) that the different processes does not get a copy of glbl_array but shares it. If there is no way to stop the copy I will go with a memmapped array, but my access patterns are not very regular, so I expect memmapped arrays to be slower. The above seemed like the first thing to try. This is on Linux. I just wanted some advice from Stackoverflow and do not want to annoy the sysadmin. Do you think it will help if the the second parameter is a genuine immutable object like glbl_array.tostring().
You can use the shared memory stuff from
multiprocessingtogether with Numpy fairly easily:which prints
However, Linux has copy-on-write semantics on
fork(), so even without usingmultiprocessing.Array, the data will not be copied unless it is written to.