I’m trying to pass some POD to a kernel which has as parameters some non-POD, and has non explicit constructors. Idea behind that is: allocate some memory on the host, pass the memory to the kernel, and it encapsulate the memory in the objects without the user to explicitly do that step.
The constructors are marked as __device__ code, but they are not called when passing the parameters, and I can’t figure out why.
My question is not really related about how should I do the thing, but trying to understand what’s happening behind the scenes.
Here an example (I’m using CUDA 5 with a GPU of capability 2.1, hence the printf).
#include <stdio.h>
struct Test {
__device__ Test() {
printf("Default\n"),
_n = 0;
}
__device__ Test(int n) {
printf("Construct %d\n", n);
_n = n;
}
__device__ Test(const Test &t) {
printf("Copy constr %d\n", t._n);
_n = t._n;
}
__device__ Test &operator=(const Test &t) {
printf("Assignment %d\n", t._n);
_n = t._n;
return *this;
}
__device__ int calc() const {
printf("Calculating %d\n", threadIdx.x + 10 * _n);
return threadIdx.x + 10 * _n;
}
int _n;
};
__global__ void dosome(Test a, Test b) {
printf("Kernel data %d %d\n", a._n, b._n);
a.calc();
b.calc();
}
int main(int argc, char **argv) {
dosome<<<1, 2>>>(2, 3);
cudaError_t cudaerr = cudaDeviceSynchronize();
if (cudaerr != cudaSuccess)
printf("kernel launch failed with error:\n\t%s\n",cudaGetErrorString(cudaerr));
return 0;
}
EDIT: Forgot to say that, none of the constructor message is printed, but the calc and kernel message are.
EDIT2: Is it guaranteed that CUDA will initialize a Test object before copying it on the device?
You have to see a constructor just like a normal method. If you qualify it with
__host__, then you’ll be able to call it host-side. If you qualify it with__device__, you’ll be able to call it device-side. If you qualify it with both, you’ll be able to call it on both sides.What happens when you do
dosome<<<1, 2>>>(2, 3);is that the two objects are implictly constructed (because your constructor is notexplicit, so maybe that’s confusing you too) host side and thenmemcpy‘d to the device. There is no copy-constructor involved in the process.Let’s illustrate this:
Now if you change your kernel to take
ints instead ofTest: