So I have a script that reads a file with 700,000 or so lines. For each line it returns a list of values it calculated from that line. Before I tried to use multiprocessing I was using a for loop and increment the values for each line to a global variable (because in the end I am after a sum). Unfortunately with the multiprocessing modules I cannot just add something to the global variable, because they are separate processes. Instead I had each process return the values I am after, and use Pool.map to create a huge list of the returned values. Then, I could loop through that list and get the sums I am after. This is very memory intensive. Any suggestions? I realize this is probably hard to read, so, I can clarify if needed. Thanks!
Share
Keep an accumulator in each process, then at the end add up all those accumulators. You only need to store one value per process.