I have some data to be processed, with each block being responsible for a given subset of the data.
Due to the nature of my application, I want this data to reside in texture memory. However, the data is too big to fit in one single texture reference.
If I understood correctly, I can have multiple texture references, but not an array of texture references.
As I need to process a different subset of the data in each block, I was thinking of doing something (in the kernel) like
while(counter < 10000) {
if(blockIdx.x == 0)
foo = tex2D(tex0, x, y);
else if(blockIdx.x == 1)
foo = tex2D(tex1, x, y);
...
}
But not only is this bad to look at, I’m also not sure if I won’t incur in divergence problems.
Doing something like
texture<int, 2, cudaReadModeElementType> ref;
(..)
/* kernel code from now on */
if(blockIdx.x == 0)
ref = tex0;
else if(blockIdx.x == 1)
ref = tex1;
...
while(counter < 10000)
foo = tex2D(ref, x, y)
also doesn’t quite seem right, as I believe texture references are global and not private to threads.
Is there any other alternative? Thank you.
If possible, you should try to keep your texture data in a single texture, and apply some transformation to the coordinates as needed to fit within the hardware limitations.
Otherwise, you can select from multiple texture references via predication. Only the non-predicated instruction will actually generate a texture memory reference.
The tex1dfetch_big.cu sample from The CUDA Handbook shows how to do this, to increase addressing beyond the 27-bit indices supported by the hardware.
https://github.com/ArchaeaSoftware/cudahandbook/blob/checkpoint/texturing/tex1dfetch_big.cu