Question
What is the most efficient way of calling multiple commands from a Java program?
Background
My team and I have been tasked with creating a program that eventually calls a series (anywhere from one hundred to ten thousand, or possibly more) of commands on a UNIX system. The commands are not simple, built-in commands, but rather are part of a software that has already been installed on the machine. Because these commands are fairly work-intensive and performance is a key factor in the success of our software, I have been researching the most efficient way of calling multiple commands from Java. Unfortunately, I have yet to find a single post, question, or forum that has discussed the calling of multiple commands from a performance perspective.
Knowledge
I am very familiar with IO in Java, and have worked with the Runtime, Process, and ProcessBuilder classes before.
What I’m Looking for
I am looking for a high or low-level (pseudo-only, if low-level, please) explanation of how to best optimize the calling of multiple commands from within a Java program. I am unable to post our code on the web, but I do not believe that our code is necessary, in this situation. For sake of ease, feel free to assume that the command we are calling is cmd, which takes arguments -a arg0 -b arg1. It may be helpful to know that the command strings are generated earlier in the program and do not change based on the results of other calls. It may also help to know that the results of all of the calls will be strings that are to be added to an ArrayList.
Thank you so much for your help.
I think the main issue here is parallel running of as many commands as can usefully be run in parallel. That will often make a far bigger difference in performance than any difference in exactly how the commands are managed.
Generally, I agree with the idea of using a script of some sort, and having Java directly run only the script’s interpreter.
In particular, in this sort of situation, I have sometimes used parallel make. The makefile can specify the commands to run, and any dependencies between them. For example, in a situation in which I needed to do hundreds of simulations, I had rules that made the raw simulation reports each depend on a simulation control file, and the processed reports depend on the raw reports.
Parallel make can make it easy and efficient to resume work after a failure without redoing all the work that was successfully completed. It can also manage limiting parallelism to a reasonable number of threads.