I have a struct Primitive, which has the following definition:
typedef struct Primitive {
float m[12];
float invm[12];
enum PrimitiveType type;
int rayDensity;
float util1;
float util2;
} Primitive;
I pass an array of these structs to my kernel in a constant memory buffer:
__constant Primitive *objects;
As part of an optimization exercise I want to look at loading the structs into local memory, so my kernel has code to the likes of this:
__kernel void test(int n_objects, __constant Primitives *objects) {
local Primitive pFrom, pTo;
for(int i = 0; i < n_objects; i++) {
pFrom = objects[i];
}
}
When I run this I get a compilation error saying :
ptxas application ptx input, line 42; error: State space mismatch between instruction and address in instruction 'ld'
As an experiment I have tried first copying the struct to a private variable and then to the local variable as follows :
__kernel void test(int n_objects, __constant Primitives *objects) {
Primitive pF, Pt;
local Primitive pFrom, pTo;
for(int i = 0; i < n_objects; i++) {
pF = objects[i]
pFrom = pF;
}
}
Which now compiles and runs however it seems like the object is not deeply copied into the local variable pFrom.
Please note that my code samples are purely samples and I have removed everything for the sake of brevity. Also my code works fine when I use the primitive structs directly from the constant global memory.
Does anyone know what I am missing here, surely its some basic fundamental to deep copying or OpenCL address spaces.
What you need is the async_work_group_copy function. You can wait for this async operation to finish using the wait_group_events function.
Hope this helps.