In all the papers i am reading i see that the GPU is made up of multiprocessors and each multiprocessor has 8 processors which are capable of executing a single warp in parallel.
The GPU i am using is Nvidia 560, it has only 7 multiprocessors but 48 processors in each multiprocessor.
does this mean that every multiprocessor in the Nvidia 560 is able to execute 6 warps in parallel?
Can i say that the max number of threads executed in parallel on Nvidia 560 is 32*6*7=1344 threads in parallel?
(32=warp , 7=multipricessors , 6=warps executed in parallel)
How many multiprocessors is in the fastest Nvidia GPU? what is this GPU?
What is the maximum amount of global memory does the biggest GPU have?
The papers you are reading are old. The first two generations of CUDA GPUs had 8 cores per MP, and issues instructions from single warp (if you want to simplify, each instruction gets executed four times on 8 cores to service a single warp).
The Fermi card you have is newer and different. It “dual-issues” instructions from two different warps per multiprocessor (so each warp instruction is executed twice on 16 cores). When the code stream allows it, an additional instruction from one of those two warps can be issued onto the remaining 16 cores, ie. a limited form of out-of-order execution. This latter feature is only available on compute capability 2.1 devices. On compute capability 2.0 devices, there are only 32 cores per multiprocessor. But the number of warps per MP retiring instructions per multiprocessor on any given shader clock cycle is two is both cases. Note that there is a rather deep instruction pipeline, so there is considerable latency between issue and retirement and up to 48 are active per multiprocessor at any instant in time.
So your answer is either 14 warps or 336 warps on 7 multiprocessors in your GTX 560, depending on which definition of “executed in parallel” you wish to adopt. The information I used to answer this mostly comes from Appendix F of the current Programming Guide.