Using CUDA, I want to allocate memory for different arrays, one for each GPU from a different function than main(), but I must have missed something in regard to pointer arithmetic. Here’s what I thought,
void InitThisMemory(int***, int N, int Nout, size_t* pitch, int height, int width); // This function's purpose is to initialize A and the pitch
int main(void){
int** A;
int N = 10;
int NOut = 2;
int height = 2, width = 2;
size_t pitch;
InitThisMemory(&A, N, NOut, &pitch, height, width);
return 0;
}
InitThisMemory(int ***A, int N, int Nout, size_t* pitch, int height, int width){
int i;
*A = (int**)malloc(Nout * sizeof(int*));
for(i = 0;i < Nout;i++){
cudaSetDevice(i);
cudaMallocPitch((void**)&(*A[i]), &(*pitch), width, height);
}
}
Disclaimer: Not my actual code but this should reproduce the error. Let me know if I missed an allocation of a variable somewhere.
Why do I think that the problem is in the arithmetic? Simply because this works pretty well if Nout = 1 (which means that I am using only one device).
Any ideas?
Your bug, I think, is writing
(void**)&(*A[i])instead of(void **) (&(*A)[i]), but I recommend you refactor as follows:malloc()return value;cudaMallocPitch();malloc()return value only if allcudaMallocPitch()calls succeed.If you do these things, then it will be simpler to write correct cleanup code in the event that one of the
cudaMallocPitch()calls fails, and you needn’t propagate the passback unless everything has succeeded.