The usual way to speed up an application is to parallelize an application using MPI or higher level libraries like PETSc which use MPI under the hood.
However nowadays everyone seems to be interested in using CUDA for parallelizing their application or using a hybrid of MPI and CUDA for more ambitious/larger problems.
Is there any noticeable advantage in using a hybrid MPI+CUDA programming model over the traditional , tried and tested MPI model of parallel programming? I am asking this specifically in the application domains of particle methods
One reason why I am asking this question is that everywhere on the web I see the statement that “Particle methods map naturally to the architecture of GPU’s” or some variation of this. But never do they seem to justify why I would be better of using CUDA than using just MPI for the same job.
This is a bit apples and oranges.
MPI and CUDA are fundamentally different architectures. Most importantly, MPI lets you distribute your application over several nodes, while CUDA lets you use the GPU within the local node. If in an MPI program your parallel processes take too long to finish, then yes, you should look into how they could be sped up by using the GPU instead of the CPU to do their work. Conversely, if your CUDA application still takes too long to finish, you may want to distribute the work to multiple nodes using MPI.
The two technologies are pretty much orthogonal (assuming all the nodes on your cluster are CUDA-capable).