I am working with a cuda program which I managed to assign a work to one Stream Multiprocessor. For example, I have the works A and B and my GPU has 2 SMs (SM0 and SM1). Are there ways to assign the work A exactly to SM0 and the work B to SM1?
Can you suggest me some ways to do that?
Thanks for your help.
One approach would be to implement work A in (let’s say) kernelA and work B in kernelB and launch both as a 1*1 grid in separate streams, because on Fermi and Kepler GPUs such kernels can run concurrently. The reason for the 1*1 grid launch is that if you have more than one blocks then those blocks may execute on different SMs and in that case the two kernels cannot execute at the same time (i.e. only one kernel/SM)
For more details, see this NVIDIA presentation