So, if I have a device (or global) function that creates/copies some data into shared memory and I later call another device function, like so:
__global__ void a(){
__shared__ int blah=0;
fun();
}
__device__ void fun(){
blah = 1; //perform some operations
//do whatever
}
I’m a bit rusty with my CUDA, I think you might have had to “redefine” shared variable (I assume the operation checked if a shared variable of that name exists, if so assigns it) – this had the effect of creating context – so basically the variable didn’t just come out of nowhere. Alternatively, if it’s similar to having a global variable in standard C/C++ and I can just reference it, like I did above, it’d be great.
I am familiar with memory hierarchy, I’m just rusty on the semantics of creating/referencing memory.
Please advise on whether the above sketch would work. Thanks.
No that won’t work in CUDA, any more that it would work in standard C99. Currently, the preferred method of
__device__function compilation is inline expansion (they are also compiled as standalone code objects for the Fermi architecture), but even so__device__functions still must obey standard syntax and scope conventions of C99. So you need to pass arguments which don’t have compilation unit scope by reference to__device__functions.