I have a problem that i can’t solve. The problem is as follows. CPP

Question

0

Editorial Team

Asked: June 8, 20262026-06-08T01:19:55+00:00 2026-06-08T01:19:55+00:00

I have a problem that i can’t solve. The problem is as follows. CPP

0

I have a problem that i can’t solve.

The problem is as follows.

CPP code

const int dataSize = 65535;
const int category = 10;
float data[dataSize][category];
const float threshold = 0.5f;

int cnt = 0;

// data array contains any values

for(int i=0;i<dataSize;i++)
{
    if( data[i][9] > threshold )
    {
        data[cnt][0] = data[i][0];
        data[cnt][1] = data[i][1];
        data[cnt][2] = data[i][2];
        data[cnt][3] = data[i][3];
        data[cnt][4] = data[i][4];
        data[cnt][5] = data[i][5];
        data[cnt][6] = data[i][6];
        data[cnt][7] = data[i][7];
        data[cnt][8] = data[i][8];
        data[cnt][9] = data[i][9];
        cnt++;
    }
}

By using this code, I expect ‘data’ array’s element is collected over threshold value.(The element that is not over the threshold is not important to me. Important thing is just over threshold value.)

I wanna code that operates with same result in CUDA.

So I tried to do like this.

CUDA code

__global__ void checkOverThreshold(float *data, float threshold, int *nCount)
{
    int idx = threadIdx.x + blockIdx.x * blockDim.x;

    if( data[idx*10+9] > threshold )
    {
        data[nCount+0] = data[idx*10+0];
        data[nCount+1] = data[idx*10+1];
        data[nCount+2] = data[idx*10+2];
        data[nCount+3] = data[idx*10+3];
        data[nCount+4] = data[idx*10+4];
        data[nCount+5] = data[idx*10+5];
        data[nCount+6] = data[idx*10+6];
        data[nCount+7] = data[idx*10+7];
        data[nCount+8] = data[idx*10+8];
        data[nCount+9] = data[idx*10+9];
        atomicAdd( nCount, 1);
    }
}

....

// kernel function call
checkOverThreshold<<< dataSize / 128, 128 >>>(d_data, treshold, d_count);

But the result of CUDA code is not that I expected.

It contains lots of trash value and even the result is not the same as CPP’s.

I think that the nCount variable’s synchronization problem makes this situation.

But, I have no idea to solve this problem.

Please help my code. Thank you in advance.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-08T01:19:58+00:00

This code is broken:

    data[nCount+0] = data[idx*10+0];
    data[nCount+1] = data[idx*10+1];
    data[nCount+2] = data[idx*10+2];
    data[nCount+3] = data[idx*10+3];
    data[nCount+4] = data[idx*10+4];
    data[nCount+5] = data[idx*10+5];
    data[nCount+6] = data[idx*10+6];
    data[nCount+7] = data[idx*10+7];
    data[nCount+8] = data[idx*10+8];
    data[nCount+9] = data[idx*10+9];
    atomicAdd( nCount, 1);

If nCount is modified during all those assignments, nonsense will result. It should be

    int d = atomicAdd(nCount, 1);
    data[d+0] = data[idx*10+0];
    data[d+1] = data[idx*10+1];
    data[d+2] = data[idx*10+2];
    data[d+3] = data[idx*10+3];
    data[d+4] = data[idx*10+4];
    data[d+5] = data[idx*10+5];
    data[d+6] = data[idx*10+6];
    data[d+7] = data[idx*10+7];
    data[d+8] = data[idx*10+8];
    data[d+9] = data[idx*10+9];

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a problem that i can’t solve. The problem is as follows. CPP

CPP code

CUDA code

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply