I want to store an image into device and I want to process it.
I am using the following to copy the image to memory.
int *image = new int[W*H];
//init image here
int *devImage;
int sizei = W*H*sizeof(int);
cudaMalloc((void**)&devImage, sizei);
cudaMemcpy(devImage, image, sizei, cudaMemcpyHostToDevice);
//call device function here.
I have two device functions. In the first function I am accessing the image from left to right and in the second function I am accessing it from top to bottom. I found that the top to bottom access takes very less time compare to left to right. This is because of the time needed for accessing the memory.
How can I efficiently access the memory in CUDA?
Random access – Use texture memory or surface memory..