I am a bit confused about synchronization.
- Using
__syncthreadsyou can synchronize threads in a block.This,
(the use of__syncthreads) must be done only with shared memory? Or
using shared memory with__syncthreadshas best performance? - Generally, threads may only safely communicate with each other if
and only if they exist within the same thread block, right? So, why
don’t we always use shared memory? Because it’s not big enough?
And, if we don’t use shared memory how can we ensure that results
are right? - I have a program that sometimes runs ok (I get the results) and
sometimes i get ‘nan’ results without altering anything. Can that be
a problem of synchronization?
The use of
__syncthreadsdoes not involve shared memory, it only ensures synchronization within a block. But you need to synchronize threads when you want them to share data through shared memory.We don’t always use shared memory because it is quite small, and because it can slow down your application when badly used. This is due to potential bank conflicts when badly addressing shared memory. Moreover, recent architectures (from 2.0) implement shared memory in the same hardware area than cache. Thus, some seasoned CUDA developers recommend not to use shared memory and rely on the cache mechanisms only.
Can be. If you want to know whether it is a deadlock, try to increase the number of blocks you’re using. If it is a deadlock, your GPU should freeze. If it is not, post your code, it will be easier for us to answer 😉