extern "C" void callKernel() { for(int i=0;i<10;i++) { calc<<< grid, thread >>>(d_arr); copyElement<<< grid,

Question

0

Asked: June 8, 20262026-06-08T12:41:48+00:00 2026-06-08T12:41:48+00:00

extern "C" void callKernel() { for(int i=0;i<10;i++) { calc<<< grid, thread >>>(d_arr); copyElement<<< grid,

0

extern "C" void callKernel()
{
    for(int i=0;i<10;i++)
    {
        calc<<< grid, thread >>>(d_arr);
        copyElement<<< grid, thread >>>(d_arr,d_arr_part,3);
        findMax<<< grid, thread >>>(d_arr_part, d_max);
        positionChange<<< grid, thread >>>(d_arr, d_max);
    }
}

Above code is about computing kernels.

The functionality of kernel function is like this.

"calc" : calculate in d_arr and update the d_arr’s elements value.

"copyElement" : for example, d_arr is 4step array, In the array, I just want 3rd element, so I allocate other variable d_arr_part and copy to 3rd element of d_arr to d_arr_part.

"findMax" : find max value in d_arr_part and the max value is stored to d_max.

"positionChange" : d_arr element is update according to d_max value.

Problem

When I execute my program, results have no consistency. Whenever I execute, results are changed. I search this problem in google and find out that kernel function is executed concurrently. My intension is all kernel function is executed in sequence. I read NVIDIA’s CUDA C programming guide at section 3.2.5. But I can’t understand what to do to solve the problem. If anybody have an idea, please show me the way. Thanks in advance.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-08T12:41:50+00:00

Editorial Team

2026-06-08T12:41:50+00:00Added an answer on June 8, 2026 at 12:41 pm

You can use cudaDeviceSynchronize in between kernel executions to guarantee a sequential order. However, your code does not require this, so I think there might be a bug in your kernels.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

extern "C" void callKernel() { for(int i=0;i<10;i++) { calc<<< grid, thread >>>(d_arr); copyElement<<< grid,

Problem

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply