I have a CUDA program that calls the kernel repeatedly within a for loop. The
code computes all rows of a matrix by using the values computed in the previous one
until the entire matrix is done. This is basically a dynamic programming algorithm.
The code below fills the (i,j) entry of many separate matrices in parallel with
the kernel.
for(i = 1; i <=xdim; i++){
for(j = 1; j <= ydim; j++){
start3time = clock();
assign5<<<BLOCKS, THREADS>>>(Z, i, j, x, y, z)
end3time = clock();
diff = static_cast<double>(end3time-start3time)/(CLOCKS_PER_SEC / 1000);
printf("Time for i=%d j=%d is %f\n", i, j, diff);
}
}
The kernel assign5 is straightforward
__global__ void assign5(float* Z, int i, int j, int x, int y, int z) {
int id = threadIdx.x + blockIdx.x * blockDim.x;
char ch = database[j + id];
Z[i+id] = (Z[x+id] + Z[y+id] + Z[z+id])*dev_matrix[i][index[ch - 'A']];
}
}
My problem is that when I run this program the time for each i and j is 0 most of the
time but sometimes it is 10 milliseconds. So the output looks like
Time for i=0 j=0 is 0
Time for i=0 j=1 is 0
.
.
Time for i=15 j=21 is 10
Time for i=15 j=22 is 0
.
I don’t understand why this is happening. I don’t see a thread race condition. If I add
if(i % 20 == 0) cudaThreadSynchronize();
right after the first loop then the Time for i and j is mostly 0. But then the time
for sync is sometimes 10 or even 20. It seems like CUDA is performing many operations
at low cost and then charges a lot for later ones. Any help would be appreciated.
I think you have a misconception about what a kernel call in CUDA actually does on the host. A kernel call is non-blocking and is only added to the device’s queue. If you’re measuring time before and after your kernel call, then the difference has nothing to do with how long your kernel call takes (it would measure the time it takes to add the kernel call to the queue).
You should add a cudaThreadSynchronize() after every kernel call and before you measure end3time. cudaThreadSynchronize() blocks and returns if all kernels in the queue have finished their work.
This is why
made spikes in your measurments.