I have a weird problem, so I thought I would ask and see if someone more experienced than me could see a solution.
I am writing a program with CUDA C/C++, and I have some constant integers that specify various things, like coordinates of the bounds of the calculation, etc.. Currently I just have those things in global device memory. They are accessed by every thread in every kernel call, and so I figured that if they are in global memory, then they never are being cached or broadcast (right?). And so these little integers are taking up a lot (relatively) of overhead, and have a lot of ‘read redundancy.’
So I declare in a header:
__constant__ int* number;
I include that header, and, when I do memory stuff, I do:
cutilSafeCall( cudaMemcpyToSymbol(number, &(some_host_int), sizeof(int) );
I pass number into all my kernel’s then:
__global__ void magical_kernel(int* number, ...){
//and I access 'number' like this
int data_thingy = big_array[ *number ];
}
My code crashes. With number in global memory, it is just fine. I have determined that it crashes sometime upon accessing number within the kernel. This means that either I am accessing or allocating it wrong. If it holds the wrong value, it will also cause a crash, because it is used to index into arrays.
To conclude, I will ask a few questions. First, what am I doing wrong? As a bonus: is there a better way than constant memory to accomplish this task – I don’t know the value of number at compile time, so a simple #define won’t work. Will constant memory even speed the code up at all, or has it been cached and broadcasted all along? Could I somehow put the data in shared memory for each threadblock and have it remain in shared memory through multiple kernel calls?
There are several problems here:
numberas a pointer, but never assigned it a value which is valid address in GPU memoryint * numberdefined inmagic_kernelis not the same variable as the__constant__ int * variabledefined as compilation unit scope.cudaMemcpyToSymbolcall is almost certainly incorrect.If you don’t understand why either of the first two point are true, you have some revision to do on pointers and scope in C++.
Based on your response to a now deleted answer, I suspect what you are actually trying to do is this:
i.e.
numberis intended to be an integer in constant memory, not a pointer, and not a kernel argument.EDIT: here is an exmaple which shows this in action:
You should be able to run this yourself and confirm it works as advertised.