Hi I am new to CUDA programming and I had 2 questions on the CUDA programming model.
In brief, the model says there is a memory hierarchy in terms of thread, blocks and then grids. Threads within a block have shared memory and are able to communicate with each other easily, but cannot communicate if they are in different blocks. There is also a global memory on the GPU device.
My questions are:
(1)Why do we need to have such a memory hierarchy consisting of threads and then blocks?
That way any two threads can communicate with each other if needed and hence probably simplify programming effort.
(2) Why is there a restriction of setting up threads only upto 3D configuations and not beyond?
Thank you.
1) This allows you to have a generalized programming model that supports hardware with different numbers of processors. It is also a reflection of the underlying GPU hardware which treats thread within a block differently from threads in different blocks WRT to memory access and synchronization.
Threads can communicate via global memory, or shared memory depending on their block affinity. You can also use synchronization primatives, like __syncthreads.
2) This is part of the programming model. I suspect is largely due to user demand to allow data decomposition for 3 dimensional problems and there was little demand for further dimension support.
The Cuda Programming Guide covers a lot of this sort of stuff. There are also a couple of books available. There’s a good discussion in Programming Massively Parallel Processors: A Hands-on Approach that goes into why GPU hardware is the way it is and how that has been reflected in the programming model.