I’m studying about the new Process Manager that came automatically with MPICH2, but until now I can’t figure out what’s is big advance of this implementation, someone have knows a good tutorial or have some experience with?
The argonne wiki is a kind of too simple: http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager
I’m studying about the new Process Manager that came automatically with MPICH2, but until
Share
From the point of view of where I work, the biggest single advance is scalability of process launching. Launching 8000+ task jobs with the previous process launchers in MPICH2-based MPI implementations was unusably slow and would frequently fail due to timeouts or other network problems, which all but ruled out MPICH2-based MPIs for our largest jobs. But Hydra has a good hierarchical launch model which can also take advantage of your resource manager.
The topology-aware allocation strategies are good, too, but compared to the difference between jobs startup failing (or taking hours) and jobs succeeding, it’s a second-order effect.