This question comes from a discussion that was touched off on this other question:
Parallelize already linear-time algorithm. It is not homework.
You are given an array of N numbers, and a machine with P processors and a shared CREW memory (Concurrent Read, Exclusive Write memory).
What is the tightest upper bound on the fastest algorithm to find the largest number in the array? [Obviously, also: What is the algorithm itself?]
I am not referring to the total amount of work performed [which can never be less than O(N)].
I think it’s
O(N/P') + O(Log2(P')), whereP'=min{N,P}.P'processors search formaxofN/P'elements each, followed byLog2pairwise merges done in parallel. The firstP'/2merges are done by even-numbered processors, next ‘P’/4’ – by processors at locations divisible by 8, then by 16, and so on.Edit
P'is introduced to cover the case when you have significantly more processor nodes than the elements that you need to search.