Hey there,
I have the following piece of code:
#if USE_CONST == 1
__constant__ double PNT[ SIZE ];
#else
__device__ double *PNT;
#endif
and a bit later I have:
#if USE_CONST == 0
cudaMalloc((void **)&PNT, sizeof(double)*SIZE);
cudaMemcpy(PNT, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice);
#else
cudaMemcpyToSymbol(PNT, point, sizeof(double)*SIZE);
#endif
whereas point is somewhere defined in the code before. When working with USE_CONST=1 everything works as expected, but when working without it, than it doesn’t. I access the array in my kernel-function via
PNT[ index ]
Where’s the problem between the both variants?
Thanks!
The correct usage of cudaMemcpyToSymbol prior to CUDA 4.0 is:
or alternatively:
which might be a bit faster if you are planning to access the symbol from the host API more than once.
EDIT: misunderstood the question. For the global memory version, do something similar to the second version for constant memory