say I have 3 share memory array: a b c. I am not sure if following thread arrangement will cause control divergence or not,
if (threadIdx < 64)
{
if (threadIdx == 1)
for (int i = 0; i < N; i++)
c += a[threadIdx]*a[threadIdx];
else
for (int i = 0; i < N; i++)
c += a[threadIdx]*b[threadIdx];
}
if it does, how bad is it gonna affect performance? is there any efficient way to handle the problem? thanks
If there is more than one thread per block, I would expect divergence in one warp of each block (whichever block holds thread 1).
But, the difference between your two loops is only in which memory to access, not in instructions. So, I would do this instead…