I’m a beginner in CUDA and I’m trying to implement a Sobel Edge detection kernel.
I’m using this code for it but it doesn’t work.
Can anyone tell me what is wrong with it. I just get some -1’s and some really big values.
__global__ void EdgeDetect_Hor(int *gpu_Edge_Hor, int *gpu_P,
int *gpu_Hor, int W, int H)
{
int X = threadIdx.x;
int Y = threadIdx.y;
int sum = 0;
int k1, k2;
int min1, min2;
for (k1 = 0; k1 < 3; k1++)
for(k2 = 0; k2 <3;k2++)
sum += gpu_Hor[k1*3+k2]*gpu_P[(X-k1)*H+Y-k2];
gpu_Edge_Hor[X*H+Y] = sum/5000;
}
I call this kernel like this:
dim3 dimBlock(W,H);
dim3 dimGrid(1,1);
EdgeDetect_Hor<<<dimGrid, dimBlock>>>(gpu_Edge_Hor, gpu_P, gpu_Hor, W, H);
First, your problem is that you process image of 480×720 pixels. CUDA supports maximum size of thread block 1024 for compute capability 2.0 and greater and 512 for previous. So you can’t execute so many threads in one block. The line
dim3 dimBlock(W,H);is incorrect. You should divide your threads to several blocks.Another problem is that CUDA process data in row-major order. So you should change you memory access pattern.
Right memory access pattern for 2D arrays in CUDA is
where