I am writing a cuda kernel which requires me to allocate an array of

Question

0

Asked: June 8, 20262026-06-08T01:06:12+00:00 2026-06-08T01:06:12+00:00

I am writing a cuda kernel which requires me to allocate an array of

0

I am writing a cuda kernel which requires me to allocate an array of aligned struct on the device.
I am getting the correct results from my computations and I need to write the values to this array starting from index 0.

When I try to write to this array and display the results back to host side, some of the answers are displayed as zero.

Clearly, I am not increasing the index as per my requirement. I tried using counter which I increase using atomicAdd(), however I still get some values as zero.

To be precise, I may use 1000 threads in my kernel for computations but my output allocated array can have a size less than 100 or more than 10000.

My question is, how do I make all these threads write the value to exactly one location of array ( as they are calculated ) and increment the array index/counter by 1 without overwriting it.

Any help will be appreciated.Thanks in advance.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-08T01:06:12+00:00

You can use atomicAdd(). It returns the old value, so you use that value as the index:

old_i = atomicAdd(&i, 1);
out_array[old_i] = val

However, you will get poor performance if many of your threads write out results, as the atomicAdd() will (indirectly) serialize all the writes. In that case, you should let each thread write its result,if any, to a slot set aside for that thread and then use a compaction algorithm (see thrust::copy_if), to gather up the results.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am writing a cuda kernel which requires me to allocate an array of

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply