I have a problem that I tried to ask about previously, but didn’t get far and have found new information and am hoping for greater help. The code is a hybrid MPI/OPENMP code that crashes with a segmentation fault when trying to run across several nodes (although it works if it is only executed on one node, being the one the master process is spawned from). There are static arrays in the problem, and I discovered that if the static arrays are “too big” it causes the seg fault, but if they are “small” everything runs fine. I also converted the code to dynamic memory allocation as a test, and this solves the problem … no matter the size (even the larger ones that failed in the static world), the code runs to completion just fine. This solution is not a long-term one though, as the test code is only that … a test code, there is a much bigger code that exhibits the same behavior and changing it to dynamic is not an option. I need to determine what is causing the static array seg fault situation.
Basically, what is the difference between how statically allocated and dynamically allocated memory is handled? What (beyond the things I have tried) should I try to get past this? I believe the problem is related to a system setting, probably one that only violates when jobs are passed through MPICH2, but are not a problem when logged into the node (hence why it runs fine on nodes that I’m currently logged into).
In my .bashrc file I have “ulimit -s unlimited“, “export OMP_STACKSIZE=4g” and “export KMP_STACKSIZE=4G” since I’m using the ifort compiler. I believe this must be a relatively simple fix but I can’t come by it.
If the violating program source code is desired I can send it out, but I think the description given here covers the problem, just let me know.
Statically allocated things appear on the stack, while dynamic is on the heap. That’s why small static arrays work fine while larger ones do not.
Since you are using the ifort compiler, you can try compiling with
-heap-arrays, but that will only put dynamically allocated arrays on the heap (ifort is unique in that “temporary” allocatable arrays may go on the stack, like those allocated in subprograms).The other thing to check is that the MPI job is actually letting you set your stack size. Try running
mpirun -n <numprocs> ulimit -sand it should show allunlimited, otherwise it isn’t honoring your bashrc.You can try a bash script (myScript.sh) such as:
which would then be run with: