In Nvida CUDA C Programming Guide 4.0, section 3.2.5.5.4, it says that two commands

Question

0

Asked: June 3, 20262026-06-03T14:30:27+00:00 2026-06-03T14:30:27+00:00

In Nvida CUDA C Programming Guide 4.0, section 3.2.5.5.4, it says that two commands

0

In Nvida CUDA C Programming Guide 4.0, section 3.2.5.5.4, it says that two commands from different streams cannot run concurrently if a device-to-device memory copy is issued in-between them. I am not sure what it exactly means. Hope someone can clarify my confusion.

Let’s say my program have two streams, stream 0 and stream 1. The following is the order kernels are launched to these streams.

Kernel 0.0 (stream 0; assume the execution time is 10 ms)

kernel 1.0 (stream 1; assume the execution time is 1 ms)

kernel 1.1 (stream 1; assume the execution time is 3 ms)

kernel 1.2 (stream 1; this kernel causes a device-to-device memory copy, assume the execution time is 1 ms)

kernel 1.3 (stream 1; assume the execution time is 6 ms)

Let’s also assume the program doesn’t have other overhead and the GPU has enough SM to run these kernels concurrently. My question is if kernel 0.0 can run concurrently with kernel 1.2 and kernel 1.3? What is the running time for the whole program?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-03T14:30:28+00:00

As mentioned, a device-to-device memory copy is done using cudaMemcpy() from the host; kernels are free to read and write global memory as they please. Kernels may overlap if they are in different streams but there is no guarantee. The exact speedup will depend on the SM utilization by each of the kernels. Nvidia recommends using events to time kernel execution (a start and stop timer) to determine if the overlapping version is faster than the sequential. You can compare this output with either switching kernels to stream 0 or running your app in the profiler, which serializes kernel execution.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

In Nvida CUDA C Programming Guide 4.0, section 3.2.5.5.4, it says that two commands

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply