So I have a few machines on the network running MongoDB:
- I can easily write code to connect to one from my PC and return a result set, e.g.:
from pymongo import Connection c = Connection("10.130.10.12") some_data = c.MyData.MyCollection.find_one()
- If I have, say 100 servers to connect to, and want to put this in a loop, that’s easy too:
all_data = [] for server in my_list_of_servers: c = Connection(server) all_data.append(c.MyData.MyCollection.find_one())
- However this does it one-by-one and could be quite slow.
- How can I send out all the requests at once? I’m super unfamiliar with threading (is that what I should even be looking into?)
The multiprocessing library is designed for this sort of task. The final
mapcall can be replaced by a for loop if you like.A description of using the multiprocessing module for Map/Reduce tasks in general is described here:
http://mikecvet.wordpress.com/2010/07/02/parallel-mapreduce-in-python/