I’m connecting to a GPU cluster from the outside and I have no idea how to select the device on which to run my CUDA programs.
I know there are two Tesla GPU in the cluster, and I’d like to choose one of them.
Any ideas how? How do you choose the device you want to use when there are many connected to your computer?
The canonical way to select a device in the runtime API is using
cudaSetDevice. That will configure the runtime to perform lazy context establishment on the nominated device. Prior to CUDA 4.0, this call didn’t actually establish a context, it just told the runtime which GPU to try and use. Since CUDA 4.0, this call will establish a context on the nominated GPU at the time of calling. There is alsocudaChooseDevice, which will select amongst available devices to find one which matches criteria supplied by the caller.You can enumerate the available GPUs on a system with
cudaGetDeviceCount, and retrieve their particulars usingcudaGetDeviceProperties. The SDK deviceQuery example shows full details of how to do this.You may need to be careful, however, on how you select GPUs in a multi-GPU system, depending on the host and driver configuration. In both the Linux and the Windows TCC driver, there exists the option for GPUs to be marked “compute exculsive”, meaning that the driver will limit each GPU to one active context at a time, or compute prohibited, meaning that no CUDA program can establish a context on that device. If your code attempts to establish a context on a compute prohibited device, or on a compute exclusive device which is in use, the result will be an invalid device error. In a multiple GPU system where the policy is to use compute exclusivity, the correct approach is not to try and select a particular GPU, but simply to allow lazy context establishment to happen implicitly. The driver will automagically select a free GPU for your code to run. The compute mode status of any device can be checked by reading the
cudaDeviceProp.computeModefield using thecudaGetDevicePropertiescall. Note that you are free to check unavailable or prohibited GPUs and query their properties, but any operation which would require context establishment will fail.See the runtime API documentation on all of these calls