I want to have a 3d float array in CUDA, here is my code:
#define SIZE_X 128 //numbers in elements
#define SIZE_Y 128
#define SIZE_Z 128
typedef float VolumeType;
cudaExtent volumeSize = make_cudaExtent(SIZE_X, SIZE_Y, SIZE_Z); //The first argument should be SIZE_X*sizeof(VolumeType)??
float *d_volumeMem;
cutilSafeCall(cudaMalloc((void**)&d_volumeMem, SIZE_X*SIZE_Y*SIZE_Z*sizeof(float)));
.....//assign value to d_volumeMem in GPU
cudaArray *d_volumeArray = 0;
cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<VolumeType>();
cutilSafeCall( cudaMalloc3DArray(&d_volumeArray, &channelDesc, volumeSize) );
cudaMemcpy3DParms copyParams = {0};
copyParams.srcPtr = make_cudaPitchedPtr((void*)d_volumeMem, SIZE_X*sizeof(VolumeType), SIZE_X, SIZE_Y); //
copyParams.dstArray = d_volumeArray;
copyParams.extent = volumeSize;
copyParams.kin = cudaMemcpyDeviceToDevice;
cutilSafeCall( cudaMemcpy3D(©Params) );
Actually, my program runs well. But I’m not sure the result is right. Here is my problem, in the CUDA liberay, it said that the first parameter of make_cudaExtent is “Width in bytes” and the other two is height and depth in elements. So I think in my code above, the fifth line should be
cudaExtent volumeSize = make_cudaExtent(SIZE_X*sizeof(VolumeType), SIZE_Y, SIZE_Z);
But in this way, there would be error “invalid argument” in cutilSafeCall( cudaMemcpy3D(©Params) ); Why?
And another puzzle is the strcut cudaExtent, as CUDA library stated,its component width stands for “Width in elements when referring to array memory, in bytes when referring to linear memory”. So I think in my code when I refer volumeSize.width it should be number in elements. However, if I use
cudaExtent volumeSize = make_cudaExtent(SIZE_X*sizeof(VolumeType), SIZE_Y, SIZE_Z);
The volumeSize.width would be SIZE_X*sizeof(VolumeType)(128*4), that is number in bytes instead of number in elements.
In many CUDA SDK, they use char as the VolumeType, so they just use SIZE_X as the first argument in make_cudaExtent. But mine is float, so, anyone could tell me which is the right way to create a cudaExtent if I need to use this to create a 3D array?? Thanks a lot!
Let’s review what the documentation for
cudaMemcpy3Dsays:and similarly the documentation for
cudaMalloc3DArraynotes:So the extent you need to form for both calls needs to have the first dimension in elements (because one of the allocations in the
cudaMemcpy3Dis an array).But you potentially have a different problem in your code, because you are allocating the linear memory source
d_volumeMemusingcudaMalloc.cudaMemcpy3Dexpects that linear source memory has been allocated with a compatible pitch. Your code is just using a linear allocation of sizeNow it might be that the dimensions you have chosen produces a compatible pitch for the hardware you are using, but it is not guaranteed that it will do so. I would recommend using
cudaMalloc3Dto allocate the linear source memory as well. An expanded demonstration of this built around your little code snippet might look like this:You can confirm for yourself that the output of the textures reads matches the original source memory on the host.