I’ve got a nested loop with a counter in between. I’ve managed to use

Question

0

Asked: June 12, 20262026-06-12T05:45:48+00:00 2026-06-12T05:45:48+00:00

I’ve got a nested loop with a counter in between. I’ve managed to use

0

I’ve got a nested loop with a counter in between.
I’ve managed to use CUDA indices for the outer loop, but I can’t think of any way of getting exploiting more parallelism in this kind of loops.
Do you have any experience working with something similar to that?

int i = threadIdx.x + blockIdx.x * blockDim.x;
if (i < Nx) {
    counter = 0;
    for (k = 0; k < Ny; k++) {

        d_V[i*Ny + k] = 0;

        if ( d_X[i*Ny + k] >= 2e2 ) {

             /* do stuff with i and k and counter i.e.*/
                d_example[i*length + counter] = k;
                    ...
             /* increment counter */
             counter++;
        }
    }
}

The problem that I see is how to deal with counter, as k could be also be indexed in CUDA with threadIdx.y + blockIdx.y * blockDim.y

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-12T05:45:49+00:00

Having a counter/loop variable which is used between loop iterations is a natural antithesis to parallelisation. Ideal parallel loops have iterations which could run in any order, with no knowledge of each other. Unfortunately a common variable makes it both order dependent and mutually aware.

It looks like you’re using the counter to pack your d_example array without gaps. This kind of thing could well be more efficient in compute time by wasting some memory; if you let the elements of d_example which won’t be set stay as zero, by inefficiently packing d_example, you can perform a filter on d_example later, after any expensive computational steps.

In fact you could even leave the filtration to a modified iterator when the array is read, which just skips over any zero values. If zero is a valid value in the array, just use a particular NaN value or a separate mask array.

int i = threadIdx.x + blockIdx.x * blockDim.x;
if (i < Nx) {
    for (k = 0; k < Ny; k++) {

        d_V[i*Ny + k] = 0;

        if ( d_X[i*Ny + k] >= 2e2 ) {

             /* do stuff with i and k and counter i.e.*/
                d_example[i*length + i*k] = k;
                d_examask[i*length + i*k] = 1;
                    ...
             /* increment counter */
        } else {
             d_examask[i*length+i*k] = 0;
        }
    }
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve got a nested loop with a counter in between. I’ve managed to use

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply