Is it safe to use __syncthreads() in a block where I have purposefully dropped threads using return?
The documentation states that __syncthreads() must be called by every thread in the block or else it will lead to a deadlock, but in practice I have never experienced such behavior.
Sample code:
__global__ void kernel(float* data, size_t size) {
// Drop excess threads if user put too many in kernel call.
// After the return, there are `size` active threads.
if (threadIdx.x >= size) {
return;
}
// ... do some work ...
__syncthreads(); // Is this safe?
// For the rest of the kernel, we need to drop one excess thread
// After the return, there are `size - 1` active threads
if (threadIdx.x + 1 == size) {
return;
}
// ... do more work ...
__syncthreads(); // Is this safe?
}
The answer to the short question is “No”. Warp level branch divergence around a
__syncthreads()instruction will cause a deadlock and result in a kernel hang. Your code example is not guaranteed to be safe or correct. The correct way to implement the code would be like this:so that the
__syncthreads()instructions are executed unconditionally.EDIT: Just to add a bit of additional information which confirms this assertion,
__syncthreads()calls get compiled into the PTXbar.syncinstruction on all architectures. The PTX2.0 guide (p133) documentsbar.syncand includes the following warning:So despite any assertions to the contrary, it is not safe to have conditional branching around a
__syncthreads()call unless you can be 100% certain that every thread in any given warp follows the same code path and no warp divergence can occur.