I have a CUDA kernel which I’m compiling to a cubin file without any special flags:
nvcc text.cu -cubin
It compiles, though with this message:
Advisory: Cannot tell what pointer points to, assuming global memory space
and a reference to a line in some temporary cpp file. I can get this to work by commenting out some seemingly arbitrary code which makes no sense to me.
The kernel is as follows:
__global__ void string_search(char** texts, int* lengths, char* symbol, int* matches, int symbolLength) { int localMatches = 0; int blockId = blockIdx.x + blockIdx.y * gridDim.x; int threadId = threadIdx.x + threadIdx.y * blockDim.x; int blockThreads = blockDim.x * blockDim.y; __shared__ int localMatchCounts[32]; bool breaking = false; for(int i = 0; i < (lengths[blockId] - (symbolLength - 1)); i += blockThreads) { if(texts[blockId][i] == symbol[0]) { for(int j = 1; j < symbolLength; j++) { if(texts[blockId][i + j] != symbol[j]) { breaking = true; break; } } if (breaking) continue; localMatches++; } } localMatchCounts[threadId] = localMatches; __syncthreads(); if(threadId == 0) { int sum = 0; for(int i = 0; i < 32; i++) { sum += localMatchCounts[i]; } matches[blockId] = sum; } }
If I replace the line
localMatchCounts[threadId] = localMatches;
after the first for loop with this line
localMatchCounts[threadId] = 5;
it compiles with no notices. This can also be achieved by commenting out seemingly random parts of the loop above the line. I have also tried replacing the local memory array with a normal array to no effect. Can anyone tell me what the problem is?
The system is Vista 64bit, for what its worth.
Edit: I fixed the code so it actually works, though it still produces the compiler notice. It does not seem as though the warning is a problem, at least with regards to correctness (it might affect performance).
Arrays of pointers like char** are problematic in kernels, since the kernels have no access to the host’s memory.
It is better to allocate a single continuous buffer and to divide it in a manner that enables parallel access.
In this case I’d define a 1D array which contains all the strings positioned one after another and another 1D array, sized 2*numberOfStrings which contains the offset of each string within the first array and it’s length:
For example – preparation for kernel:
char* buffer = st[0] + st[1] + st[2] + ….; int* metadata = new int[numberOfStrings * 2]; int lastpos = 0; for (int cnt = 0; cnt < 2* numberOfStrings; cnt+=2) { metadata[cnt] = lastpos; lastpos += length(st[cnt]); metadata[cnt] = length(st[cnt]); }In kernel: