I currently have a global Lock = threading.Lock(), and make the following call:
Parallel(n_jobs=2)(delayed(serialRemove)(dir,c,b,l,f) for f in os.listdir(dir))
using jobLib. In serialRemove, I have
Lock.acquire()
print(f+' begin')
if h in hashes:
try:
os.remove(path)
if l: print('Removing ' + path)
removed += 1
except os.error:
print('Encountered error removing file')
else:
hashes.add(h)
print(f+' end')
Lock.release()
Part of the call results in:
10.txt begin
11.txt begin
20.txt begin
I don’t understand how there could be two begin prints if I surround the code in a Lock. Is there any easy way to protect the code block so ideally I get:
10.txt begin
10.txt end
11.txt begin
11.txt end
20.txt begin
20.txt end
threading.Lockonly works between threads of the same process.Without actually knowing what library you’re using for parallelism here, it’s hard to be sure, but it’s almost certainly executing the tasks in separate processes. (Anything that starts threads in the same process, at least with CPython, isn’t going to get any effective parallelism for CPU-bound code, because of the GIL. Therefore, none of them do that.)
So, if you try to use a global
threading.Lockobject from other processes, you’re going to get a completely independent lock in each process. So, locking it doesn’t do any good. (With some parallel libraries—possibly different on each platform—you’ll get an error instead. But there’s no way it could possibly do what you want.)Most parallelization libraries have their own lock types that work with their style of multiprocessing. If yours does, use the one that comes with your library.
If not, depending on how your library works,
multiprocessing.Lockmay do the trick.If not, you’ll have to implement something explicitly using, e.g., a lock file (possibly together with
flock/lockf, or relying on Windows exclusive open, or whatever).Also, note that at least one of the multiple libraries that has an API that could make sense of your example line of code, [
joblib], is explicitly designed for tasks that do not have any sharing, and therefore isn’t supposed to work with locks at all. (It probably will work withmultiprocessing.Lockanyway, but you really shouldn’t count on that.)