I use (CUDA C++) Thrust for GPU GeForce GTX 460SE with asyncEngineCount = 1.

Question

0

Asked: June 9, 20262026-06-09T19:38:02+00:00 2026-06-09T19:38:02+00:00

I use (CUDA C++) Thrust for GPU GeForce GTX 460SE with asyncEngineCount = 1.

0

I use (CUDA C++) Thrust for GPU GeForce GTX 460SE with asyncEngineCount = 1.
As I know I can overlap transfer data one of way to/from GPU and executing single kernel. But when I use:

cudaStream_t Stream1, Stream2;
cudaStreamCreate(&Stream1);
cudaStreamCreate(&Stream2);
cudaMemcpyAsync(thrust::raw_pointer_cast(d_vec_src.data()), host_ptr1, test_size, cudaMemcpyHostToDevice, Stream1);
cudaMemcpyAsync(host_ptr2, thrust::raw_pointer_cast(d_vec_dst.data()), test_size, cudaMemcpyDeviceToHost, Stream2);
thrust::sort(d_vec_dst.begin(), d_vec_dst.end());
cudaThreadSynchronize();

and Thrust algorithms, it executes sequentially as I see in nVidia Visual Profiler: transfer from GPU, transfer to GPU, executing kernel. Maybe this is because Thrust algorithms executing in zero 0-stream which can’t overlap with anything? And how solve this problem?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-09T19:38:03+00:00

Thrust doesn’t presently have a mechanism for controlling the execution stream of its algorithms, so you can’t do what you are asking with the current code base. There have been reports of users modifying the thrust code base to accept a stream (for example this google groups thread) but that may or may not be viable depending on the complexity of the algorithm you use and its structure. Some algorithms also have internal data transfers and you would need to be very careful to not break things when moving from serial to asynchronous execution.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I use (CUDA C++) Thrust for GPU GeForce GTX 460SE with asyncEngineCount = 1.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply