I have an MPI program with some array of data. Every rank needs all the array to do its work, but will only work on a patch of the array. After a calculation step I need every rank to communicate its computed piece of the array to all other ranks.
How do I achieve this efficiently?
In pseudo code I would do something like this as a first approach:
if rank == 0: // only master rank
initialise_data()
end if
MPI_Bcast(all_data,0) // from master to every rank
compute which part of the data to work on
for ( several steps ): // each rank
execute_computation(part_of_data)
for ( each rank ):
MPI_Bcast(part_of_data, rank_number) // from every rank to every rank
end for
end for
The disadvantage is that there is as many broadcasts, i.e. barriers as there is ranks. So how would I replace the MPI_Bcasts ?
edit: I just might have found a hint… Is it MPI_Allgather I am looking for?
Yes, you are looking for
MPI_Allgather. Note thatrecvcountis not the length of the whole recieve buffer, but the amount of data should be recieved from one process. Analogically, inMPI_Allgathervrecvcount[i]is the amount of data you want to recieve from i-th process. Moreover,recvcountshould be equal (not less) to the respectivesendcount. I tested it on my implemetation (OpenMPI), and if I tried to recieve less elements that were sent, I gotMPI_ERR_TRUNCATEerror.Also in some rare cases I used
MPI_Allreducefor that puprose. For example if we have the following arrays:then we can do Allreduce with
MPI_SUMoperation and getAACCBBin all processes. Obviously, the same trick can be done with ones instead of zeros andMPI_PRODinstead ofMPI_SUM.