I am developing a Windows 64-bit application that will manage concurrent execution of different CUDA-algorithms on several GPUs.
My design requires a way of passing pointers to device memory
around c++ code. (E.g. remember them as members in my c++ objects).
I know that it is impossible to declare class members with __device__ qualifiers.
However I couldn’t find a definite answer whether assigning __device__ pointer to a normal C pointer and then using the latter works. In other words: Is the following code valid?
__device__ float *ptr;
cudaMalloc(&ptr, size);
float *ptr2 = ptr
some_kernel<<<1,1>>>(ptr2);
For me it compiled and behaved correctly but I would like to know whether it is guaranteed to be correct.
No, that code isn’t strictly valid. While it might work on the host side (more or less by accident), if you tried to dereference
ptrdirectly from device code, you would find it would have an invalid value.The correct way to do what your code implies would be like this:
for CUDA 4.x or newer, change the
cudaMemcpyToSymbolto:If the static device symbol
ptris really superfluous, you can just to something like this:But I suspect that what you are probably looking for is something like the thrust library
device_ptrclass, which is a nice abstraction wrapping the naked device pointer and makes it absolutely clear in code what is in device memory and what is in host memory.