I have a class that I use on both host and device code, to allow for easier data passing. This class has some method that manipulates the data. A simple example is:
struct Vector {
float x, y, z;
__host__ __device__ Vector(float _x, float _y, float _z) {
//...
}
};
If I implement this class on a header file, it works fine and nvcc is happy. However, if I try to implement the constructor on the source file, nvcc complains the constructor is non-inlined. Is there anyway to bypass this or that is just a limitation of the compiler?
Up until CUDA 5.0 the CUDA compiler has had the restriction that everything required by a kernel (i.e. a
__global__function) must be in a single translation unit. For pre-Fermi devices (i.e. compute capability 1.x) the compiler also had to inline all__device__functions. So if you have the struct defined in file a.cu and the__global__kernel that uses the struct defined in b.cu, then when the compiler is processing b.cu it would be unable to find the__device__function.With CUDA 5.0 you are able to compile the two files separately and link them together. This still requires Fermi or later (2.x or later).