Given that I have the array Let Sum be 16 dintptr = { 0

Question

0

Asked: May 29, 20262026-05-29T08:06:37+00:00 2026-05-29T08:06:37+00:00

Given that I have the array Let Sum be 16 dintptr = { 0

0

Given that I have the array

Let Sum be 16
dintptr = { 0 , 2, 8,11,13,15}

I want to compute the difference between consecutive indices using the GPU. So the final array should be as follows:

count = { 2, 6,3,2,2,1}

Below is my kernel:

//for this function n is 6

__global__ void kernel(int *dintptr, int * count, int n){

   int id = blockDim.x * blockIdx.x + threadIdx.x;
   __shared__ int indexes[256];
   int need = (n % 256 ==0)?0:1;
   int allow = 256 * ( n/256 + need);
   while(id < allow){
     if(id < n ){
       indexes[threadIdx.x] = dintptr[id];

     }
     __syncthreads();
     if(id < n - 1 ){
       if(threadIdx.x % 255 == 0 ){
            count[id] = indexes[threadIdx.x + 1] - indexes[threadIdx.x];
       }else{
            count[id] = dintptr[id+1] - dintptr[id];
       }


    }//end if id<n-1
      __syncthreads();
     id+=(gridDim.x * blockDim.x);
    }//end while
}//end kernel
// For last element explicitly set count[n-1] = SUm - dintptr[n-1]

2 questions:

Is this kernel fast. Can you suggest a faster implementation?
Does this kernel handle arrays of arbitrary size ( I think it does)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-29T08:06:38+00:00

I’ll bite.

__global__ void kernel(int *dintptr, int * count, int n)
{
    for (int id = blockDim.x * blockIdx.x + threadIdx.x; 
         id < n-1; 
         id += gridDim.x * blockDim.x)
        count[id] = dintptr[id+1] - dintptr[i];
}

(Since you said you “explicitly” set the value of the last element, and you didn’t in your kernel, I didn’t bother to set it here either.)

I don’t see a lot of advantage to using shared memory in this kernel as you do: the L1 cache on Fermi should give you nearly the same advantage since your locality is high and reuse is low.

Both your kernel and mine appear to handle arbitrary-sized arrays. Yours however appears to assume blockDim.x == 256.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Given that I have the array Let Sum be 16 dintptr = { 0

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply