I am trying to utilize parallel nodes to run numerical simulations. I have Nodes #0 though 12 and I wish to utilize them each individually to run a separate part of the simulation. Essentially, I need to evaluate f(x) for x=1 through 4 on one node, then f(x) for x=5 through 9 on the next node, and then f(x) for x = 10 through 14 one the next one, and then so on from there. Initially, I tried using a loop like:
n=0
while [ $n -le 12 ]
do
ssh compute-0-$n
#evaluate the f(x) for the x values that I want
exit
n=$(($n+1))
done
But this did not work because whenever I used the ssh compute-0-$n command to jump to a node the connection to the original shell script seemed to cease, when I would exit the node, the shell script seemed to continue along its merry way… I suppose there is a better way to accomplish this, but I am relatively new to this, can anyone help?
The first thing to understand is that when you run ssh (without the &), ssh itself runs until completion. It opens up a new shell on the remote host, and reads commands — but not the commands from the script that launched it. The ssh session has no knowledge of the script that launched it; it’s waiting for commands from stdin.
You need to do three things:
it into its own script (call it docompute.sh).
each compute node, in a directory in the $PATH variable of the
executing user, and
loop with
ssh compute-0-$n docompute.sh &. The&will get youthe parallelism you want, by running the ssh process in the
background.
See running same script over many machines for discussion of something quite similar. The use of & to run the command in the background is key there.