I know "Maximum amount of shared memory per multiprocessor" for GPU with Compute Capability 2.0 is 48KB as is said in the guide.
I’m a little confused about the amount of shared memory I can use for each block? How many blocks are in a multiprocessor. I’m using a GeForce GTX 580.
On Fermi, you can use up to 16kb or 48kb (depending on the configuration you select) of shared memory per block – the number of blocks which will run concurrently on a multiprocessor is determined by how much shared memory and registers each block requires, up to a maximum of 8. If you use 48kb, then only a single block can run concurrently. If you use 1kb per block, then up to 8 blocks could run concurrently per multiprocessor, depending on their register usage.