1) I have read that if I import the threading module in python, CPU bound loads won’t see much benefit from using this library because the GIL forces threads to run 1 at a time even if I run code on a multi-core machine. If this is the case what sort of code would benefit from using Python’s threading library?
2) If that’s the case for the threading library, then to perform CPU intensive tasks in parallel, such as cross-correlations of two signals, is the multiprocessing module the best module to use?
To make this more concrete, let’s say the task I’d like to parallelize is the for loop in the following code, and my machine has only 12 cores. Suppose my template has a length of ~1000, my image has a length of ~2000 and I have ~1000 signals to sort through:
import numpy as np
###2-D array of shape (points, signals)
signals = np.load('signals.npy')
###1-D template array for cross correlation
templateSignal = np.load('template.npy')
for s in range(signals.shape[2]):
xcorr = np.correlate(templateSignal, signal[:,s])
Even with the GIL, threading in Python is useful because input/output operations don’t block the program. You can perform operations while waiting for the completion of a disk operation or while waiting for a network event.
Threads also play a role in GUI applications, where a program can stay responsive to user input while performing calculations in the backgrounds (thanks @FogleBird)
As for 2), you are correct in your assumption that you can spread CPU-intensive programs to multiple cores with the
multiprocessingmodule. Be aware of the costs of communication between processes.