I’m trying to use multiple CUDA devices from multiple OpenMP threads. The devices are initialized (i.e. memory is allocated on them) from the main thread, and then I use cudaSetDevice from different threads to then launch kernels on different devices. Threads are not sharing devices, each thread has exclusive access to its device.
From what I understand, this should work fine. However, as soon as I launch a kernel on a device from an OpenMP thread which is the not the main (i.e. omp_get_thread_num() != 0) I get an “invalid device ordinal error” from CUDA:
kernel<<<...>>>(...);
error = cudaDeviceSynchronize(); // returns cudaSuccess
error = cudaGetLastError(); // returns invalid device ordinal error
Am I missing something? Has anyone seen something like this before? I’m using CUDA 5.0.
Just to close this issue, this problem was a result of me using cudaGetLastError to try and check for errors after a kernel launch, but not checking the error return value of one previous call. Therefore, it was returning the error code from a call to cudaGetDeviceInfo after the kernel launch which I mistakenly inferred to be coming from the launch itself. If you see this error, I would just advise making sure that you’re checking the error values returned by all previous calls to the CUDA API.