I’d like to know if, when i’m calling cudaMemcpy(…) to get memory on the GPU, also the values inside the array are copied or not. I will explain better: I’m copying the values from one array to another and then i call cudaMalloc and cudaMemcpy.
// Copying values of the arrays
for(int i = 0; i<16; i++){
array_device_1[i] = array_host_1[i];
array_device_2[i] = array_host_2[i];
}
// Memory allocation of array_device_1 and array_device_2
cudaMalloc((void**) &array_device_1, SIZE_INT*size);
cudaMalloc((void**) &array_device_2, SIZE_INT*size);
// Transfer array_device_1 and array_device_2
cudaMemcpy(array_device_1, array_host_1, SIZE_INT*size, cudaMemcpyHostToDevice);
cudaMemcpy(array_device_2, array_host_2, SIZE_INT*size, cudaMemcpyHostToDevice);
kernel<<<N, N>>>(array_device_1, array_device_2);
cudaMemcpy(array_host_1, array_device_1, SIZE_INT*size, cudaMemcpyDeviceToHost);
cudaMemcpy(array_host_2, array_device_2, SIZE_INT*size, cudaMemcpyDeviceToHost);
cudaFree(array_device_1);
cudaFree(array_device_2);
So, when i’m executing all those instructions and I’m using all the arrays inside the kernel, are the values inside the array_device_1 and array_device_2 or not ? I tried to print out after the kernel and i noticed that all the arrays are empty! Really i can’t understand how i can keep the values inside them and then changing their values with kernel function.
Yes they have their values inside. But you can’t print them out on the host. For this you will need to copy your data back using
And then you can print the values of
array_host_2.A bit more explanation: Your
array_device_*lives on the GPU and from your CPU (that is printing your output) you do not have direct access to this data. So you need to copy it back to your CPUs memory first before printing it out.