I have some memory that has been allocated on device that is just a single malloc of H*W*sizeof(float) in size.
This is to represent an H*W matrix.
I have a code where I need to swap the quadrants of the matrix. Can i use cudaMemcpy2D to accomplish this? Would I just need to specify the spitch and dpitch to be W*sizeof(float) and just use pointers to each quadrant of the matrix to accomplish this?
Also, when these cudaMemcpy talk about the memory areas not overlapping – does that mean src and dst cannot overlap at all? As in, if I had a 10 byte wide array that I wanted to shift left one time – it will fail?
Thanks
You can use cudaMemcpy2D for moving around sub-blocks which are part of larger pitched linear memory allocations. There is no problem in doing that. The non-overlapping requirement is non-negotiable and it will fail if you try it. The source and destination can come from the same allocation, but the address ranges of the source and destination cannot overlap. If you need to do some “in-situ” copying where there is overlap, you might be better served to write a kernel to do it (see the matrix transpose example in the SDK as a sound way to do that kind of thing).