I’m faced with parallelizing an algorithm which in its serial implementation examines the six faces of a cube of array locations within a much larger three dimensional array. (That is, select an array element, and then define a cube or cuboid around that element ‘n’ elements distant in x, y, and z, bounded by the bounds of the array.
Each work unit looks something like this (Fortran pseudocode; the serial algorithm is in Fortran):
do n1=nlo,nhi
do o1=olo,ohi
if (somecondition(n1,o1) .eq. .TRUE.) then
retval =.TRUE.
RETURN
endif
end do
end do
Or C pseudocode:
for (n1=nlo,n1<=nhi,n++) {
for (o1=olo,o1<=ohi,o++) {
if(somecondition(n1,o1)!=0) {
return (bool)true;
}
}
}
There are six work units like this in the total algorithm, where the ‘lo’ and ‘hi’ values generally range between 10 and 300.
What I think would be best would be to schedule six or more threads of execution, round-robin if there aren’t that many CPU cores, ideally with the loops executing in parallel, with the goal the same as the serial algorithm: somecondition() becomes True, execution among all the threads must immediately stop and a value of True set in a shared location.
What techniques exist in a Windows compiler to facilitate parallelizing tasks like this? Obviously, I need a master thread which waits on a semaphore or the completion of the worker threads, so there is a need for nesting and signaling, but my experience with OpenMP is introductory at this point.
Are there message passing mechanisms in OpenMP?
EDIT: If the highest difference between “nlo” and “nhi” or “olo” and “ohi” is eight to ten, that would imply no more than 64 to 100 iterations for this nested loop, and no more than 384 to 600 iterations for the six work units together. Based on that, is it worth parallelizing at all?
One possibility is to use OpenMP to parallelize over the 6 loops — declare
logical :: array(6), allow each loop to run to completion, and thenretval = any(array). Then you can check this value and return outside the parallelized loop. Add aschedule(dynamic)to the parallel do statement if you do this. Or, have a separate!$omp paralleland then put!$omp do schedule(dynamic)…!$omp end do nowaitaround each of the 6 loops.Or, you can follow the good advice by @M.S.B. and parallelize the outermost loop over the whole array. The problem here is that you cannot have a
RETURNinside a parallel loop — so label the second outermost loop (the largest one within the parallel part), andEXITthat loop — smth like[edit: the
ifstatement is there presuming that you need to find out if there is at least one element like that in the big array. If you need to figure the condition for every element, you can similarly either add a dummy loop exit or goto, skipping the rest of the processing for that element. Again, use schedule(dynamic) or schedule(guided).]As a separate point, you might also want to check if it may be a good idea to go through the innermost loop by some larger step (depending on float size), compute a vector of logicals on each iteration and then aggregate the results, eg. smth like
if(count(somecondition(x(o1:o1+step,n1,k)))>0); in this case the compiler may be able to vectorizesomecondition.