I have an array of unsigned integers stored on the GPU with CUDA (typically 1000000 elements). I would like to count the occurrence of every number in the array. There are only a few distinct numbers (about 10), but these numbers can span from 1 to 1000000. About 9/10th of the numbers are 0, I don’t need the count of them. The result looks something like this:
58458 -> 1000 occurrences
15 -> 412 occurrences
I have an implementation using atomicAdds, but it is too slow (a lot of threads write to the same address). Does someone know of a fast/efficient method?
You can implement a histogram by first sorting the numbers, and then doing a keyed reduction.
The most straightforward method would be to use
thrust::sortand thenthrust::reduce_by_key. It’s also often much faster than ad hoc binning based on atomics. Here’s an example.