My question is regarding the multiprocessing module of Python.
In the simplest form, my question is the strange behaviour of the following code:
import numpy as np
from multiprocessing import Pool
x = np.random.random(100)
y = np.random.random(100)
y2 = y[:]
def I(i):
y[i] = x[i]
pool = Pool()
pool.map(I,range(100))
After the execution, my hope is that y = x.
However, we get y = y2. (The assignments are not working.)
Why is this happening?
What is the best way to compute f(x[i]) and assign it to y[i]?
The behavior you’re seeing is not so surprising if you think about what is being synchronized between the processes used by
Poolto do your work. Only the arguments and return values of theIfunction are synchronized in your current code, so it makes sense thatxandykeep their original values in the calling process.I suspect your current code is a minimal test case, which is troublesome because there’s not really a meaningful implementation of copying one array to another using
Pool.map. Here’s a trivial solution, but I’m not sure it generalizes to whatever your real task is:This passes each value of
xthrough to another process (where nothing is done with it) and the result values are passed back and assigned intoy(pool.map returns a list). It’s pretty silly.A slightly more sophisticated approach might copy
xover to the worker processes, using theinitializerandinitargsarguments in thePoolconstructor. Here’s an example that does that:Note though that
xis only copied one way. IfIwere to modify its value, the changes would not be synchronized between processes.If your task is something that really does requires synchronized access to both the source and target array, you might try out
multiprocessing.Array. I don’t have any direct experience with it, but it should be possible to replaceywith a synchronized version of itself. Unfortunately, I suspect the synchronization will slow your program down, so don’t do it unless you really need to!