I have a kernel which produces an array of result values and I want to find the maximum of these values efficiently. The Array is initialized in the beginning of the kernel with some negative value (for example -1). The Kernel is executed using 5 blocks each with 256 threads, for example.
Here are the problems:
-
Because of my data, i must terminate threads, that are not valid, so I am working sometimes with 256 threads, sometimes 50, 20 and so on.
-
In shared memory are written results from block, but as I mentioned, some array has 50 results, some has 256 results…(so shared array looks like this) 8,6,4,9,1,-1,-1,-1…
-
In that case how to efficiently find the maximum in one block ?
Parallel reduction will be complicated on these types of array, isn’t it ? How to do this ?
There is not enough information about your algorithm.
What do you mean by n results?
Are the ignored values in the array set to -1 or do you use dynamic extern shared memory and threads only write up to index n (sounds hard to realize)?
Use a fixed size shared memory and set ignored values to -1 and still perform parallel reduction, if you search positive maximum and its filled with -1 it doesn’t matter, or
Don’t terminate threads, instead set a flag in another array if thread should not perform further calculations and still do parallel reduction