The CUDA programming guide states that
__syncthreads() is allowed in conditional code but only if the
conditional evaluates identically
across the entire thread block,
otherwise the code execution is likely
to hang or produce unintended side
effects.
So if I need to synchronize threads with a conditional branching across a block, some of which threads may or may not take the branch that includes the __syncthreads() call, does this mean that it won’t work?
I’m imagining that there might be all sorts of cases in which you might need to do this; for example, if you have a binary mask and need to apply a certain operation on pixels conditionally. Say, if (mask(x, y) != 0) then execute the code that includes __syncthreads(), otherwise do nothing. How would that be done?
If you need to go this route you could split the body into two phases:
Alternatively you could use the condition to set a flag that disables certain operations, for example if you’re computing a delta update you could do the following: