Is there a way to do parallel reduction of an array on CPU in C/C++?. I recently learnt that it’s not possible using openmp. Any other alternatives?
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Added: Note that you can implement “custom” reduction with OpenMP, in the way described here.
For C++: with
parallel_reducein Intel’s TBB (SO tag: tbb), you can make reduction on complex types such as arrays and structs. Though the amount of required code can be significantly bigger compared to OpenMP’s reduction clause.As an example, let’s parallelize a naive implementation of matrix-to-vector multiplication:
y=Cx. Serial code consists of two loops:Usually, to parallelize it the loops are exchanged to make the outer loop iterations independent and process them in parallel:
However it’s not always good idea. If M is small and N is large, swapping the loop won’t give enough parallelism (for example, think of calculating a weighted centroid of N points in M-dimensional space, with
Cbeing the array of points andxbeing the array of weights). So a reduction over an array (i.e. a point) would be helpful. Here is how it can be done with TBB (sorry, the code was not tested, errors are possible):Disclaimer: I am affiliated with TBB.