I have a program where I do a bunch of calculations on GPU, then

Question

0

Asked: May 23, 20262026-05-23T06:52:25+00:00 2026-05-23T06:52:25+00:00

I have a program where I do a bunch of calculations on GPU, then

0

I have a program where I do a bunch of calculations on GPU, then I do memory operations with those results on CPU, then I take the next batch if data and do the same all over. Now it would be a lot faster if I could do the first set of calculations and then start with the second batch whilst my CPU churned away at the memory operations. How would I do that?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T06:52:25+00:00

All CUDA kernel calls (e.g. function<<<blocks, threads>>>()) are asynchronous — they return control immediately to the calling host thread. Therefore you can always perform CPU work in parallel with GPU work just by putting the CPU work after the kernel call.

If you also need to transfer data from GPU to CPU at the same time, you will need a GPU that has the deviceOverlap field set to true (check using cudaGetDeviceProperties()), and you need to use cudaMemcpyAsync() from a separate CUDA stream.

There are examples to demonstrate this functionality in the NVIDIA CUDA SDK — For example the “simpleStreams” and “asyncAPI” examples.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a program where I do a bunch of calculations on GPU, then

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply