I’ve searched all over for some insight on how exactly to use classes with CUDA, and while there is a general consensus that it can be done and apparently is being done by people, I’ve had a hard time finding out how to actually do it.
I have a class which implements a basic bitset with operator overloading and the like. I need to be able to instantiate objects of this class on both the host and the device, copy between the two, etc. Do I define this class in a .cu? If so, how do I use it in my host-side C++ code? The functions of the class do not need to access special CUDA variables like threadId; it just needs to be able to be used host and device side.
Thanks for any help, and if I’m approaching this in completely the wrong way, I’d love to hear alternatives.
Define the class in a header that you #include, just like in C++.
Any method that must be called from device code should be defined with both
__device__and__host__declspecs, including the constructor and destructor if you plan to usenew/deleteon the device (notenew/deleterequire CUDA 4.0 and a compute capability 2.0 or higher GPU).You probably want to define a macro like
Then use this macro on your member functions
The reason for this is that only the CUDA compiler knows
__device__and__host__— your host C++ compiler will raise an error.Edit:
Note
__CUDACC__is defined by NVCC when it is compiling CUDA files. This can be either when compiling a .cu file with NVCC or when compiling any file with the command line option-x cu.