I’m doing a research about gpu in cluster environments using mpi to communicate. In

Question

0

Asked: May 26, 20262026-05-26T17:05:58+00:00 2026-05-26T17:05:58+00:00

I’m doing a research about gpu in cluster environments using mpi to communicate. In

0

I’m doing a research about gpu in cluster environments using mpi to communicate.
In order to compare speed up, I think in create:

A Multiplication of matrix just for GPU, ok.
Now just CPU MatrixMulti, ok.
But I can’t find a nice implementation of CUDA + MPI matrix multiplication.

Anyone have some hint about where I can fin this? Or suggest one implementation.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T17:05:58+00:00

The MTL4 Matrix Template Library can be a great starting point. Right now MTL4 has multi-core, DMM, and we are almost done with a full GPU implementation. Peter and I have been talking about distributed GPU algorithms, but since our focus is driven by PDE solvers for the moment, distributed GPU algorithms are difficult to make competitive against robust DMM.

However, I am working on a new geophysics/medical imaging solver set that is more conducive for distributed GPU computes as the data sets are more modest and the video capabilities of the GPU are beneficial.

To get started, take a look at the MTL4 tutorial

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m doing a research about gpu in cluster environments using mpi to communicate. In

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply