I’m having issues understanding how OpenMP works with nested loops. Please help! I got

Question

0

Editorial Team

Asked: June 15, 20262026-06-15T01:27:56+00:00 2026-06-15T01:27:56+00:00

I’m having issues understanding how OpenMP works with nested loops. Please help! I got

0

I’m having issues understanding how OpenMP works with nested loops. Please help!

I got the following code to run in parallel:

#pragma omp parallel private(i)
{
for(i=0; i<n; i++)
{
    #pragma omp for
    for(j=0; j<n; j++)
    {
        if(asubsref(seed,j) > 0)
            asubsref(bin,j) = asubsref(bin,j) + 1;
    }
    #pragma omp for
    for(j=0; j<n; j++)
        asubsref(seed,j) = asubsref(seed,j) - asubsref(w,i);
}
}

However, I’m not quite sure how this code works (I just got it by luck). Here’s what I think it’s doing…

So for(i=0; i<n; i++) is split off into different threads and run in parallel. Because i was declared to be private, each instance of the loop is “sandboxed”; that is, any changes to j remain in that thread (at least until all the threads are done?). I’m confused because not declaring #pragma omp for causes the code to break…I’m not sure why this is.

If someone could walk me through what this code is doing, I’d appreciate it very much! Thank you!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-15T01:27:58+00:00

This is what one typically does in order to reduce the overhead of multiple enters and exists into and out of a parallel region. Compare the code from your question to the following equivalent code:

for (i=0; i<n; i++)
{
    #pragma omp parallel for
    for (j=0; j<n; j++)
    {
        if (asubsref(seed,j) > 0)
            asubsref(bin,j) = asubsref(bin,j) + 1;
    }
    #pragma omp parallel for
    for (j=0; j<n; j++)
        asubsref(seed,j) = asubsref(seed,j) - asubsref(w,i);
}

It runs the outer loop over i n times. In each iteration of the outer loop, it runs two parallel loops over j, splitting the n iterations among the threads. The problem here is that you have two parallel regions inside, each of which is activated n times. This could add significant overhead for a certain interval of values of n (the actual interval would depend on many factors).

In order to reduce the overhead, one puts the whole code inside a parallel region. Thus each thread would execute all the iterations of the outer loop and the inner iterations would still be split among the threads. If you remove the worksharing constructs, then the inner loop would not get distributed among the threads. Instead, each thread would execute both the full range of outer and inner iterations, so for example asubsref(bin,j) would get incremented up to num_threads times instead of just once (why “up to”? hint: unprotected concurrent data access).

Most of the times when one runs an outer loop inside a parallel construct, one wants different threads to keep ticking on the same iterations. This could be achieved by a barrier at the end of the loop body:

#pragma omp parallel private(i)
{
    for (i = 0; i < n; i++)
    {
        ...
        #pragma omp barrier
    }
}

In your case an explicit barrier is not necessary since the for worksharing construct has an implicit barrier at its end.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m having issues understanding how OpenMP works with nested loops. Please help! I got

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply