I am using OpenMP to parallelize loops. In normal case, one would use: #pragma

Question

0

Asked: May 26, 20262026-05-26T16:02:54+00:00 2026-05-26T16:02:54+00:00

I am using OpenMP to parallelize loops. In normal case, one would use: #pragma

0

I am using OpenMP to parallelize loops. In normal case, one would use:

#pragma omp for schedule(static, N_CHUNK)
for(int i = 0; i < N; i++) {
    // ...
}

For nested loops, I can put pragma on the inner or outter loop

#pragma omp for schedule(static, N_CHUNK) // can be here...
for(int i = 0; i < N; i++) {
#pragma omp for schedule(static, N_CHUNK) // or here...
    for(int k = 0; k < N; k++) {
    // both loops have consant number of iterations
    // ...
    }
}

But! I have two loops, where number of iterations in 2nd loop depends on the 1st loop:

for(int i = 0; i < N; i++) {
    for(int k = i; k < N; k++) {
    // k starts from i, not from 0...
    }
}

What is the best way to balance CPU usage for this kind of loop?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T16:02:54+00:00

As always:

it depends
profile.
In this case: see also OMP_NESTED environment variable

The things that are going to make the difference here are not being shown:

(non)linear memory addressing (also watch the order of the loops
use of shared variables;

As to your last scenario:

for(int i = 0; i < N; i++) {
    for(int k = i; k < N; k++) {
    // k starts from i, not from 0...
    }
}

I suggest parallelizing the outer loop for the following reasons:

all other things being equal coarse grained parallelizing usually leads to better performance due to
- increased cache locality
- reduced frequency of locking required
  (note that this hinges on assumptions about the loop contents that I can’t really make; I’m basing it on my experience of /usual/ parallelized code)
the inner loop might become so short as to be inefficient to parallelize (IOW: the outer loop’s range is predictable, the inner loop less so, or doesn’t lend itself to static scheduling as well)
nested parallellism rarely scales well

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am using OpenMP to parallelize loops. In normal case, one would use: #pragma

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply