I have the optimized code for parallel (exclusive) scan algorithm, that’s written in OpenCL.
I’ve read that inner (dot) product of a vector is based on parallel reduction but I was wondering is it somehow possible to use this already finished scan algorithm for the purpose?
I have the optimized code for parallel (exclusive) scan algorithm, that’s written in OpenCL.
Share
dot product by defintion is a reduction algorithm. The reduction algorithm is not too hard to implement and even a moderately optimized version is much faster than a scan algorithm. It is best if you wrote a fast reduction algorithm that you can use.