extern "C" void callKernel()
{
for(int i=0;i<10;i++)
{
calc<<< grid, thread >>>(d_arr);
copyElement<<< grid, thread >>>(d_arr,d_arr_part,3);
findMax<<< grid, thread >>>(d_arr_part, d_max);
positionChange<<< grid, thread >>>(d_arr, d_max);
}
}
Above code is about computing kernels.
The functionality of kernel function is like this.
"calc" : calculate in d_arr and update the d_arr’s elements value.
"copyElement" : for example, d_arr is 4step array, In the array, I just want 3rd element, so I allocate other variable d_arr_part and copy to 3rd element of d_arr to d_arr_part.
"findMax" : find max value in d_arr_part and the max value is stored to d_max.
"positionChange" : d_arr element is update according to d_max value.
Problem
When I execute my program, results have no consistency. Whenever I execute, results are changed. I search this problem in google and find out that kernel function is executed concurrently. My intension is all kernel function is executed in sequence. I read NVIDIA’s CUDA C programming guide at section 3.2.5. But I can’t understand what to do to solve the problem. If anybody have an idea, please show me the way. Thanks in advance.
You can use
cudaDeviceSynchronizein between kernel executions to guarantee a sequential order. However, your code does not require this, so I think there might be a bug in your kernels.