If I have a multiprocess system that needs to process a bunch of directories, 1 directory per process, how likely is it that two processes will happen to grab the same directory?
Say I have dir/1 all the way to dir/99. I figure that if I touch a .claimed file in the dir that the process is working on, there won’t be conflicts. Are there problems with my approach?
There’s a bit more complexity. It’s not only multi-process, but it’s distributed across several computers.
I recall something about directory creation being atomic, but not file creation, so your .claimed ought to be a directory – however I don’t recall what OS that applied to.
I’d take a different approach: list all the directories you want to process, writing the output to a pipe, which acts as a work queue that each process will read from. IIRC system pipe semantics (named or anonymous) are that reading from a pipe is an atomic operation: two processes will not be able to read the same data.
A master process could write the list to a pipe and spawn the worker processes, or the worker processes could just block trying to read until you manually write the list to the pipe.