I need to limit the number of processes being executed in parallel. For instance I’d like to execute this psuedo-command line:
export POOL_PARALLELISM=4
for i in `seq 100` ; do
pool foo -bar &
done
pool foo -bar # would not complete until the first 100 finished.
Therefor despite 101 foos being queued up to run, only 4 would be running at any given time. pool would fork()/exit() and queue the remaining processes until complete.
Is there a simple mechanism to do this with Unix tools? at and batch don’t apply because they generally invoke on the top of the minute as well as execute jobs sequentially. Using a queue is not necessarily the best because I want these synchronous.
Before I write a C wrapper employing semaphores and shared memory and then debug deadlocks that I’ll surely introduce, can anyone recommend a bash/shell or other tool mechanism to accomplish this.
There’s definitely no need to write this tool yourself, there’s several good choices.
makemakecan do this pretty easy, but it does rely extensively on files to drive the process. (If you want to run some operation on every input file that produces an output file, this might be awesome.) The-jcommand line option will run the specified number of tasks and the-lload-average command line option will specify a system load average that must be met before starting new tasks. (Which might be nice if you wanted to do some work “in the background”. Don’t forget about thenice(1)command, which can also help here.)So, a quick (and untested)
Makefilefor image converting:If you run this with
make, it’ll run one-at-a-time. If you run withmake -j8, it’ll run eight separate jobs. If you runmake -j, it’ll start hundreds. (When compiling source code, I find that twice-the-number-of-cores is an excellent starting point. That gives each processor something to do while waiting for disk IO requests. Different machines and different loads might work differently.)xargsxargsprovides the--max-procscommand line option. This is best if the parallel processes can be divided apart based on a single input stream with either asciiNULseparated input commands or new-line separated input commands. (Well, the-doption lets you pick something else, but these two are common and easy.) This gives you the benefit of usingfind(1)‘s powerful file-selection syntax rather than writing funny expressions like theMakefileexample above, or lets your input be completely unrelated to files. (Consider if you had a program for factoring large composite numbers in prime factors — making that task fit intomakewould be awkward at best.xargscould do it easily.)The earlier example might look something like this:
parallelThe
moreutilspackage (available at least on Ubuntu) provides theparallelcommand. It can run in two different ways: either running a specified command on different arguments, or running different commands in parallel. The previous example could look like this:beanstalkdThe
beanstalkdprogram takes a completely different approach: it provides a message bus for you to submit requests to, and job servers block on jobs being entered, execute the jobs, and then return to waiting for a new job on the queue. If you want to write data back to the specific HTTP request that initiated the job, this might not be very convenient, as you have to provide that mechanism yourself (perhaps a different ‘tube’ on thebeanstalkdserver), but if the end result is submitting data into a database, or email, or something similarly asynchronous, this might be the easiest to integrate into your existing application.