I’m trying to test out a sample code from the CUDA site http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#kernels .

Question

0

Asked: June 14, 20262026-06-14T17:31:19+00:00 2026-06-14T17:31:19+00:00

I’m trying to test out a sample code from the CUDA site http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#kernels .

0

I’m trying to test out a sample code from the CUDA site http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#kernels.

I simply want to add two arrays A and B of size 4, and store it in array C. Here is what I have so far:

#include <stdio.h>
#include "util.h"
void print_array(int* array, int size) {
int i;
for (i = 0; i < size; i++) {
    printf("%d ", array[i]);
}
printf("\n");
}

__global__ void VecAdd(int* A, int* B, int* C) {
int i = threadIdx.x;
C[i] = A[i] + B[i];
}

int main(int argc , char **argv) {
int N = 4;
    int i;
int *A = (int *) malloc(N * sizeof(int));
int *B = (int *) malloc(N * sizeof(int));
int *C = (int *) malloc(N * sizeof(int));

for (i = 0; i < N; i++) {
    A[i] = i + 1;
    B[i] = i + 1;
}

print_array(A, N);
print_array(B, N);


VecAdd<<<1, N>>>(A, B, C);
print_array(C, N);
    return 0;
}

I’m expecting the C array (the last row of the output) to be 2, 4, 6, 8, but it doesn’t seem to get added:

1 2 3 4
1 2 3 4
0 0 0 0

What am I missing?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T17:31:22+00:00

First, you have to define the pointers that will hold the data that will be copied to GPU:

In your example, we want to copy the arrays ‘a’,’b’ and ‘c’ from CPU to the GPU's global memory.

int a[array_size], b[array_size],c[array_size]; // your original arrays
int *a_cuda,*b_cuda,*c_cuda;                    // defining the "cuda" pointers

define the size that each array will occupy.

int size = array_size * sizeof(int); // Is the same for the 3 arrays

Then you will allocate the space to the data that will be used in cuda:

Cuda memory allocation:

msg_erro[0] = cudaMalloc((void **)&a_cuda,size);
msg_erro[1] = cudaMalloc((void **)&b_cuda,size);
msg_erro[2] = cudaMalloc((void **)&c_cuda,size);

Now we need to copy this data from CPU to the GPU:

Copy from CPU to GPU:

msg_erro[3] = cudaMemcpy(a_cuda, a,size,cudaMemcpyHostToDevice);
msg_erro[4] = cudaMemcpy(b_cuda, b,size,cudaMemcpyHostToDevice);
msg_erro[5] = cudaMemcpy(c_cuda, c,size,cudaMemcpyHostToDevice);

Execute the kernel

int blocks = //;
int threads_per_block = //;
VecAdd<<<blocks, threads_per_block>>>(a_cuda, b_cuda, c_cuda);

Copy the results from GPU to CPU (in our example array C):

msg_erro[6] = cudaMemcpy(c,c_cuda,size,cudaMemcpyDeviceToHost);

Free Memory:

cudaFree(a_cuda);
cudaFree(b_cuda);
cudaFree(c_cuda);

For debugging purposes, I normally save the status of the functions on an array, like this:

cudaError_t msg_erro[var];

However, this is not strictly necessary but it will save you time if an error occurs during the allocation or memory transference. You can take out all the ‘msg_erro[x] =’ from the code above if you wish.

If you mantain the ‘msg_erro[x] =’, and if a error does occur you can use a function like the one that follows, to print these erros:

void printErros(cudaError_t *erros,int size)
{
 for(int i = 0; i < size; i++)
      printf("{%d} => %s\n",i ,cudaGetErrorString(erros[i]));
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to test out a sample code from the CUDA site http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#kernels .

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply