I am having a problem of transposing an image:
I call the kernel method:
// index of the pixel on the image
int index_in = index_x + index_y * width;
int index_out = index_x + index_y*height;
// Allocate the shared memory
__shared__ unsigned int onchip_storage[16][16];
// Load the inputs to the shared memory
onchip_storage[threadIdx.y][threadIdx.x] = in[index_in];
// Save the output value to the memory
out[index_out] = onchip_storage[threadIdx.x][threadIdx.y];
I got the image rotated but somehow the colors are not as original. Any idea?
Thanks in advance.
Assuming your RGB components are interleaved, then your algorithm is not handling the three components correctly. You really need to make your tile size a multiple of 3 in width, e.g. 18 x 18. Then when you do the transpose you need to transpose elements which are 3 x 4 = 12 bytes wide.