I want to block some blocks until one variable is set to a particular value. So I write this code to test if a simple do-while loop will work.
__device__ int tag = 0;
__global__ void kernel() {
if ( threadIdx.x == 0 ) {
volatile int v;
do {
v = tag;
}
while ( v == 0 );
}
__syncthreads();
return ;
}
However, it doesn’t work(No dead loop occurs, very strange).
I want to ask if any other method is able to block some blocks until some conditions satisfied or if some changes on the code will work.
There currently is no reliable way to perform inter-block synchronization in CUDA.
There are hacky ways to achieve some manner of locking or blocking between blocks with a modest number of total threads, but they exploit undefined behaviour in the execution model which are not guaranteed to run the same way on all hardware or continue to work in the future. The only reliable way to ensure synchronization or blocking between blocks is to us separate kernel launches. If you can’t make your algorithm work without interblock synchronization, you either need a new algorithm, or your application is a very poor fit for the GPU architecture.