I am using OpenMP to parallelize loops. In normal case, one would use:
#pragma omp for schedule(static, N_CHUNK)
for(int i = 0; i < N; i++) {
// ...
}
For nested loops, I can put pragma on the inner or outter loop
#pragma omp for schedule(static, N_CHUNK) // can be here...
for(int i = 0; i < N; i++) {
#pragma omp for schedule(static, N_CHUNK) // or here...
for(int k = 0; k < N; k++) {
// both loops have consant number of iterations
// ...
}
}
But! I have two loops, where number of iterations in 2nd loop depends on the 1st loop:
for(int i = 0; i < N; i++) {
for(int k = i; k < N; k++) {
// k starts from i, not from 0...
}
}
What is the best way to balance CPU usage for this kind of loop?
As always:
The things that are going to make the difference here are not being shown:
As to your last scenario:
I suggest parallelizing the outer loop for the following reasons:
all other things being equal coarse grained parallelizing usually leads to better performance due to
(note that this hinges on assumptions about the loop contents that I can’t really make; I’m basing it on my experience of /usual/ parallelized code)
the inner loop might become so short as to be inefficient to parallelize (IOW: the outer loop’s range is predictable, the inner loop less so, or doesn’t lend itself to static scheduling as well)