Hello I’m trying to write a CUDA kernel to perform the following piece of

Question

0

Asked: May 20, 20262026-05-20T15:15:36+00:00 2026-05-20T15:15:36+00:00

Hello I’m trying to write a CUDA kernel to perform the following piece of

0

Hello
I’m trying to write a CUDA kernel to perform the following piece of code.

for (n = 0; n < (total-1); n++)
{
  a = values[n];

  for ( i = n+1; i < total ; i++)
  {
    b = values[i] - a;
    c = b*b;

    if( c < 10)
        newvalues[i] = c;
    }
}

This is what I have currently, but it does not seem to be giving the correct results? does anyone know what I’m doing wrong. Cheers

__global__ void calc(int total, float *values, float *newvalues){

float a,b,c;

int idx = blockIdx.x * blockDim.x + threadIdx.x;

for (int n = idx; n < (total-1); n += blockDim.x*gridDim.x){
    a = values[n];

    for(int i = n+1; i < total; i++){
        b = values[i] - a;
        c = b*b;

    if( c < 10)
        newvalues[i] = c;

    }
}

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-20T15:15:37+00:00

Realize this problem in 2D and launch your kernel with 2D thread blocks. The total number of threads in x and y dimension will be equal to total . The kernel code should look like this:

__global__ void calc(float *values, float *newvalues, int total){
   
    float a, b, c;
    
    int n = blockIdx.y * blockDim.y + threadIdx.y;
    int i = blockIdx.x * blockDim.x + threadIdx.x;

    if (n>=total || i>=total)
        return;
    
    a = values[n];
    b = values[i] - a;
    c = b * b;
    if (c < 10)
        newvalues[i] = c;  

    // I don't know your problem statement but i think it should be like: newvalues[n*total+i] = c;  

}

Update:

This is how you should call the kernel

dim3 block(16,16);
dim3 grid (  (total+15)/16,  (total+15)/16  );
calc<<<grid,block>>>(float *val, float *newval, int T);

Also make sure you add this line in kernel (see updated kernel)

if (n>=total || i>=total)
    return;

Update 2:
fixed blockIdy.y, correct is blockIdx.y

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Hello I’m trying to write a CUDA kernel to perform the following piece of

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply