I need to sort 20+ arrays, already on the GPU, each of the same

Question

0

Asked: May 27, 20262026-05-27T03:08:47+00:00 2026-05-27T03:08:47+00:00

I need to sort 20+ arrays, already on the GPU, each of the same

0

I need to sort 20+ arrays, already on the GPU, each of the same length, by the same keys. I can not use sort_by_key() directly since it sorts the keys as well (making them useless to sort the next array). Here is what I tried instead:

thrust::device_vector<int>  indices(N); 
thrust::sequence(indices.begin(),indices.end());
thrust::sort_by_key(keys.begin(),keys.end(),indices.begin());

thrust::gather(indices.begin(),indices.end(),a_01,a_01);
thrust::gather(indices.begin(),indices.end(),a_02,a_02);
...
thrust::gather(indices.begin(),indices.end(),a_20,a_20);

This does not seem to work since gather() expects a different array for the output than for the input, i.e. this works:

thrust::gather(indices.begin(),indices.end(),a_01,o_01);
...

However, I would prefer to not allocate 20+ extra arrays for this task. I know that there is a solution using a thrust::tuple, thrust::zip_iterator and thrust::sort_by_keys(), similiar to here. However, I can only combine up to 10 arrays in a tuple, s.t. I would need to duplicate the key vector again. How would you tackle this task?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T03:08:48+00:00

Well, you really only need to allocate one extra array if you are OK with manipulating pointers to device_vector instead:

thrust::device_vector<int>  indices(N); 
thrust::sequence(indices.begin(),indices.end());
thrust::sort_by_key(keys.begin(),keys.end(),indices.begin());

thrust::device_vector<int> temp(N);
thrust::device_vector<int> *sorted = &temp;
thrust::device_vector<int> *pa_01 = &a_01;
thrust::device_vector<int> *pa_02 = &a_02;
...
thrust::device_vector<int> *pa_20 = &a_20;

thrust::gather(indices.begin(), indices.end(), *pa_01, *sorted);
pa_01 = sorted; sorted = &a_01;
thrust::gather(indices.begin(), indices.end(), *pa_02, *sorted);
pa_02 = sorted; sorted = &a_02;
...
thrust::gather(indices.begin(), indices.end(), *pa_20, *sorted);
pa_20 = sorted; sorted = &a_20;

Or something like that should work anyway. You would need to fix it so the temp device vector is not automatically deallocated when it goes out of scope — I suggest allocating the CUDA device pointers using cudaMalloc and then wrapping them with device_ptr instead of using automatic device_vectors.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need to sort 20+ arrays, already on the GPU, each of the same

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply