My main FORTRAN MPI code reaches a point where all processes call a script. The codelooks something like
write(syscommand,'(a131xi3)') './vscript.csh' my_mpi_proc_num
rc=system(syscommand)
Now, this section of code loops though over a hundred times, and the script runs fine on all processes. Then, randomly as far as I can tell, some process will enter system and then will return an error code of 32512. A few other things then happen (sorry I can’t show much more code. My employer would not be too happy.), then an MPI_ABORT is called and all the processes die. I am told that 32512 is often the error code returned when a command cannot be found. This is unlikely because, as I have indicated, the script is found hundreds of times before this crash, and nothing is moving it around.
I seem to have found a stop gap measure:
write(syscommand,'(a131xi3)') './vscript.csh' my_mpi_proc_num
rc=32512
num_attempts=0
do while (num_attempts<100 .and. rc==32512)
num_attempts=num_attempts+1
rc=system(syscommand)
enddo
i.e. each process will try 100 times to get past the 32512 thing. Although I am sure this is horrible code, it is working.
So, anyone have a clue why I am getting this error? A thought: If two processes try to run the same script near simultaneously, will one of them be kicked out and forced to return that 32512? Thanks.
Probably your compiler implements the
systemintrinsic as a call to the POSIXsystem(3)function provided by the system library.This call returns an integer number that is organized as follows.
The last line is the important one.
The return code 32512 is 0x7F00, i.e. the exit status of the child shell is 127. In Bourne shell and other UNIX shells this means the command was not found from PATH and is not a built-in shell command (see this question). It is also known as “command not found” error.
If anything could be mangling with your PATH variable this could be it. You may try to replace
./vscript.cshand all the commands within with their absolute paths?On some MPI implementations spawning processes from MPI processes is not supported. We have seen issues with some versions of OpenMPI. If you call
fork()orsystem()from an OpenMPI program you will get a warning:On the other hand recent version of OpenMPI FAQ claims that
This limitation is not specific to OpenMPI and affects every implementation that relies on OpenFabrics stack.