I’m having issues understanding how OpenMP works with nested loops. Please help!
I got the following code to run in parallel:
#pragma omp parallel private(i)
{
for(i=0; i<n; i++)
{
#pragma omp for
for(j=0; j<n; j++)
{
if(asubsref(seed,j) > 0)
asubsref(bin,j) = asubsref(bin,j) + 1;
}
#pragma omp for
for(j=0; j<n; j++)
asubsref(seed,j) = asubsref(seed,j) - asubsref(w,i);
}
}
However, I’m not quite sure how this code works (I just got it by luck). Here’s what I think it’s doing…
So for(i=0; i<n; i++) is split off into different threads and run in parallel. Because i was declared to be private, each instance of the loop is “sandboxed”; that is, any changes to j remain in that thread (at least until all the threads are done?). I’m confused because not declaring #pragma omp for causes the code to break…I’m not sure why this is.
If someone could walk me through what this code is doing, I’d appreciate it very much! Thank you!
This is what one typically does in order to reduce the overhead of multiple enters and exists into and out of a parallel region. Compare the code from your question to the following equivalent code:
It runs the outer loop over
intimes. In each iteration of the outer loop, it runs two parallel loops overj, splitting theniterations among the threads. The problem here is that you have two parallel regions inside, each of which is activatedntimes. This could add significant overhead for a certain interval of values ofn(the actual interval would depend on many factors).In order to reduce the overhead, one puts the whole code inside a parallel region. Thus each thread would execute all the iterations of the outer loop and the inner iterations would still be split among the threads. If you remove the worksharing constructs, then the inner loop would not get distributed among the threads. Instead, each thread would execute both the full range of outer and inner iterations, so for example
asubsref(bin,j)would get incremented up tonum_threadstimes instead of just once (why “up to”? hint: unprotected concurrent data access).Most of the times when one runs an outer loop inside a parallel construct, one wants different threads to keep ticking on the same iterations. This could be achieved by a barrier at the end of the loop body:
In your case an explicit barrier is not necessary since the
forworksharing construct has an implicit barrier at its end.