I would like to instantiate a class in CUDA code, that shares some of its members with other threads in the same block.
However, when trying to compile the following code, I get the error:
attribute "shared" does not apply here
(nvcc version 4.2).
class SharedSomething {
public:
__shared__ int i; // this is not allowed
};
__global__ void run() {
SharedSomething something;
}
What is the rationale behind that? Is there a work-around to achieve the desired behavior (shared members of a class across one block)?
Rost explained the rationale behind the limitation. To answer the second part of the question, a simple workaround is to have the kernel declare the shared memory, and initialize a pointer to it owned by the class, e.g. in the class constructor. Example.
Caveat: code written in browser, unverified, untested (and it’s a trivial example, but the concept extends to real code—I have used this technique myself).