I wrote simple kernel to test functionality of CUDA __syncthreads. In kernel I’ve managed to print from each thread if updated value is not visible to other threads. Ideally no thread should print Not visible to me error message but some threads end up printing this message.
Here is the kernel.
__device__ int a=0;
__global__ void kernel()
{
isItOK=false;
if(threadIdx.x==0 && blockIdx.x==0)
{
atomicAdd(&a,1);
__threadfence();
}
__syncthreads();
if(atomicAdd(&a,0)==0)
{
cuPrintf("Not Visible to me\n");
}
}
int main()
{
int *a;
cudaPrintfInit();
kernel<<<16,16>>>();
cudaPrintfDisplay(stdout,true);
cudaPrintfEnd();
}
Please help me with this, very simple test program but still not working. Do we need some compiler flags to set ?
__syncthreads()is a synchronization barrier primitive that only synchronizes threads in the same block.CUDA has no mechanism for safely synchronizing across thread blocks.
Communication and synchronization between thread blocks is not recommended because it breaks scalability of execution across GPUs with varying numbers of multiprocessors, which is the reason for having thread blocks in the first place.