I’m trying to test out a sample code from the CUDA site http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#kernels.
I simply want to add two arrays A and B of size 4, and store it in array C. Here is what I have so far:
#include <stdio.h>
#include "util.h"
void print_array(int* array, int size) {
int i;
for (i = 0; i < size; i++) {
printf("%d ", array[i]);
}
printf("\n");
}
__global__ void VecAdd(int* A, int* B, int* C) {
int i = threadIdx.x;
C[i] = A[i] + B[i];
}
int main(int argc , char **argv) {
int N = 4;
int i;
int *A = (int *) malloc(N * sizeof(int));
int *B = (int *) malloc(N * sizeof(int));
int *C = (int *) malloc(N * sizeof(int));
for (i = 0; i < N; i++) {
A[i] = i + 1;
B[i] = i + 1;
}
print_array(A, N);
print_array(B, N);
VecAdd<<<1, N>>>(A, B, C);
print_array(C, N);
return 0;
}
I’m expecting the C array (the last row of the output) to be 2, 4, 6, 8, but it doesn’t seem to get added:
1 2 3 4
1 2 3 4
0 0 0 0
What am I missing?
First, you have to define the pointers that will hold the data that will be copied to GPU:
In your example, we want to copy the arrays ‘a’,’b’ and ‘c’ from
CPUto theGPU'sglobal memory.define the size that each array will occupy.
Then you will allocate the space to the data that will be used in cuda:
Cuda memory allocation:
Now we need to copy this data from CPU to the GPU:
Copy from CPU to GPU:
Execute the kernel
Copy the results from GPU to CPU (in our example array C):
Free Memory:
For debugging purposes, I normally save the status of the functions on an array, like this:
However, this is not strictly necessary but it will save you time if an error occurs during the allocation or memory transference. You can take out all the ‘msg_erro[x] =’ from the code above if you wish.
If you mantain the ‘msg_erro[x] =’, and if a error does occur you can use a function like the one that follows, to print these erros: