I have the following multi-GPU CUDA code for my first time:
int main( void ) {
int count;
cudaGetDeviceCount( &count );
float** gtt = new float*[count];
for (int i=0; i< count; i++) {
cudaSetDevice(i);
int j;
cudaGetDevice(&j);
printf("get device %d\n",j);
cudaMalloc((void**)>t[i], 2*sizeof(float));
cudaFree(gtt[i]);
}
}
I found 3 devices on the same node, but there was a segment fault on the second GPU running. I have a CUDA version of 4010, and compute capability of 2.0.
Eventually I found out the problem. I have set up the cuda profiling environment by:
The second line cause the problem. There may be some conflict when different gpus are writing to the same profiling log file. Changing the second line to:
solves the problem.