I am interested in using CUDA to program a multi-GPU application.
As far as I know, one can use multiple GPU’s to execute 2 or more kernels execute simultaneously in parallel. Each kernel’s data resides on the GPU it is executing on.
But what if I want my data and kernel operation to span several cards. How does one do this?
The simpleMultiGPU example in the CUDA SDK is not what I want since it basically launches the same kernel on multiple GPUs. No inter GPU communication is present, which is what I am interested in.
It sounds like you’re interested in Unified Virtual Addressing (UVA) and P2P communication. Consult http://developer.download.nvidia.com/CUDA/training/cuda_webinars_GPUDirect_uva.pdf . You should not be communicating between different CUDA blocks anyway, but the techniques I mention should at least allow you to read data and write data across multiple GPUs, access the data in more flexible ways.