I’m trying to learn how to use Python’s multiprocessing package, but I don’t understand the difference between map and imap.
Is the difference that map returns, say, an actual array or set, while imap returns an iterator over an array or set? When would I use one over the other?
Also, I don’t understand what the chunksize argument is. Is this the number of values that are passed to each process?
That is the difference. One reason why you might use imap instead of map is if you wanted to start processing the first few results without waiting for the rest to be calculated. map waits for every result before returning.
As for chunksize, it is sometimes more efficient to dole out work in larger quantities because every time the worker requests more work, there is IPC and synchronization overhead.