I am writing a CUDA kernel for Histogram on a picture, but I had

Question

0

Asked: May 15, 20262026-05-15T05:45:25+00:00 2026-05-15T05:45:25+00:00

I am writing a CUDA kernel for Histogram on a picture, but I had

0

I am writing a CUDA kernel for Histogram on a picture, but I had no idea how to return a array from the kernel, and the array will change when other thread read it. Any possible solution for it?

__global__ void Hist(
    TColor *dst, //input image
    int imageW,
    int imageH,
 int*data
){
    const int ix = blockDim.x * blockIdx.x + threadIdx.x;
    const int iy = blockDim.y * blockIdx.y + threadIdx.y;

if(ix < imageW && iy < imageH)
{
  int pixel = get_red(dst[imageW * (iy) + (ix)]);
                  //this assign specific RED value of image to pixel

  data[pixel] ++; // ?? problem statement ...
 }
}

@para d_dst: input image TColor is equals to float4.

@para data: the array for histogram size [255]

extern "C" void
cuda_Hist(TColor *d_dst, int imageW, int imageH,int* data) 
{
  dim3 threads(BLOCKDIM_X, BLOCKDIM_Y);
  dim3 grid(iDivUp(imageW, BLOCKDIM_X), iDivUp(imageH, BLOCKDIM_Y));
  Hist<<<grid, threads>>>(d_dst, imageW, imageH, data);
}

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-15T05:45:26+00:00

Have you looked at the SDK sample? The “histogram” sample is available in the CUDA SDK (currently version 3.0 on the NVIDIA developer site, version 3.1 beta available for registered developers).

The documentation with the sample explains nicely how to handle your summation, either using global memory atomics on the GPU or by collecting the results for each block separately and then doing a separate reduction (either on the host or the GPU).

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am writing a CUDA kernel for Histogram on a picture, but I had

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply