I have a program with more than 100 subroutines and I am trying to make this code to run faster and I am trying to compile these subroutines using parallel flag. I was wondering what variable or parameters do I need to define in the program if I want to use the parallel flag. Just using the parallel optimization flag increased the run time for my program compared to the one without parallel flag.
Any suggestions is highly appreciated. Thanks a lot.
Best Regards,
Jdbaba
I can give you some general guidelines, but without knowing your specific compiler and platform/OS I won’t be able to help you specifically. As far as I know, all of the autoparallelization schemes that are used in Fortran compilers end up using either OpenMP or MPI commands to split the loops out into either threads or processes. The issue is that there is a certain amount of overhead associated with those schemes. For instance, in one case I had a program that used an optimization library which was provided by a vendor as a compiled library without optimization within it. As all of my subroutines and functions were either outside or inside the large loop of the optimizer, and since there was only object data, the autoparallelizer wasn’t able to perform ipo and as such it failed to use more than the one core. The run times in this case, due to the DLL that was loaded for OpenMP, the /qparallel actually added ~10% to the run time.
As a note, autoparallelizers aren’t magic. Essentially all they are doing is the same type of thing that the autovectorization techniques do, which is to look for loops that have no data that are dependent upon the previous iteration. If it detects that variables are changed between iterations or if the compiler can’t tell, then it will not attempt to parallelize the loop.
If you are using the Intel Fortran compiler, you can turn on a diagnostic switch “/qpar-report3” or “-par-report3” to give you information as to the dependency tree of loops to see why they failed to optimize. If you don’t have access to large sections of the code you are using, in particular parts with major loops, there is a good chance that there won’t be much opportunity in your code to use the auto-parallelizer.
In any case, you can always attempt to reduce dependencies and reformulate your code such that it is more friendly to autoparallelization.