I wrote a function to modify a passed in dictionary. However, when I parallelized the code using the multiprocessing module, it exhibits different behavior then when run in serial. The dictionary is not modified.
Attached below is a toy example of my issue. The dictionary is not modified when run using map_async, but is modified when run in a for loop. Thanks for clarifying my confusion!
#!/usr/bin/env python
from multiprocessing import Pool
def main1(x):
x['a'] = 1
print x
return 1
def main2(x):
x['b'] = 2
print x
p = Pool(2)
d = {1:{}, 2:{}}
r = p.map_async(main1, d.values())
print r.get()
print "main1", d
for x in d.values():
main2(x)
print "main2", d
r = p.map_async(main1, d.values())does this:1) Evaluate
d.values()– that’s[{}, {}]2) Execute
main1(item)for each item in that list on a worker from the pool3) Gather the results from those calls into a list –
[1, 1]– because that’s whatmain1returns4) Assign that list to
rSo it does exactly what the builtin function
map()does, but in a parallelized way.This means, your dict
dnever makes it into any of the worker processes, because it’s not a reference todthat’s passed tomap_async, and thereforemain1.And even if you would pass in a reference to
d– it wouldn’t work for the reasons explained by @Roland Smith.The point is: You shouldn’t modify the dictionary in the first place. It’s not even very good style in conventional programming for functions to modify their arguments, even if they can. For parallel programming it’s absolutely crucial to follow a functional programming style, which in this context means:
Functions should do the computation on their input, and return a result that is further processed.
The functions map and reduce are very common in functional programming, and combined together they form a pattern that is suited very well for distributed computing. From the Wikipedia article on MapReduce:
So in order to effectively parallelize your program it helps to try to think of your problem in terms of those functions.
For a very concrete example, see the Article The Trouble With Multicore in IEEE Spectrum. It describes a method of parallelizing the computation of PI that could easily be implemented with map/reduce.