I am trying to resolve Fatal Error in MPI_Irecv: Aborting Job and received mixed (useful, however incomplete) responses to that query.
The error message is the following:
aborting job:
> Fatal error in MPI_Irecv: Other MPI
> error, error stack: MPI_Irecv(143):
> MPI_Irecv(buf=0x8294a60, count=48,
> MPI_DOUBLE, src=2, tag=-1,
> MPI_COMM_WORLD, request=0xffffd6ac)
> failed MPID_Irecv(64): Out of
> memory
I am seeking help from someone to answer to these questions (I require guidance to help debug and resolve this deadlock)
-
At the end of "MPI Non Blocking Send and Receive", is the memory freed by itself after the send/receive has completed OR does it have to be forced to be freed?
-
Will the issue of "Out of memory" be resolved if I use "Multiple Cores" instead of a Single one?. We presently have 4 processors to 1 core and I submit my job using the following command:
mpirun -np 4 <file>. I tried usingmpirun n -4 <file>but it still ran 4 threads on the same core. -
How do I figure out how much "Shared memory" is required for my program?
The MPI_ISend/MPI_IRecv is inside a recursive loop in my code and hence not very clear if the source of error lies there (If I use the Send/Recv. commands just once or twice, system computes just fine without "Out of Memory Issues"). If so, how does one check and relieve such information?
#include <mpi.h>
#define Rows 48
double *A = new double[Rows];
double *AA = new double[Rows];
....
....
int main (int argc, char *argv[])
{
MPI_Status status[8];
MPI_Request request[8];
MPI_Init (&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &p);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
while (time < final_time){
...
...
for (i=0; i<Columns; i++)
{
for (y=0; y<Rows; y++)
{
if ((my_rank) == 0)
{
MPI_Isend(A, Rows, MPI_DOUBLE, my_rank+1, 0, MPI_COMM_WORLD, &request[1]);
MPI_Irecv(AA, Rows, MPI_DOUBLE, my_rank+1, MPI_ANY_TAG, MPI_COMM_WORLD, &request[3]);
MPI_Wait(&request[3], &status[3]);
MPI_Isend(B, Rows, MPI_DOUBLE, my_rank+2, 0, MPI_COMM_WORLD, &request[5]);
MPI_Irecv(BB, Rows, MPI_DOUBLE, my_rank+2, MPI_ANY_TAG, MPI_COMM_WORLD, &request[7]);
MPI_Wait(&request[7], &status[7]);
}
if ((my_rank) == 1)
{
MPI_Irecv(CC, Rows, MPI_DOUBLE, my_rank-1, MPI_ANY_TAG, MPI_COMM_WORLD, &request[1]);
MPI_Wait(&request[1], &status[1]);
MPI_Isend(Cmpi, Rows, MPI_DOUBLE, my_rank-1, 0, MPI_COMM_WORLD, &request[3]);
MPI_Isend(D, Rows, MPI_DOUBLE, my_rank+2, 0, MPI_COMM_WORLD, &request[6]);
MPI_Irecv(DD, Rows, MPI_DOUBLE, my_rank+2, MPI_ANY_TAG, MPI_COMM_WORLD, &request[8]);
MPI_Wait(&request[8], &status[8]);
}
if ((my_rank) == 2)
{
MPI_Isend(E, Rows, MPI_DOUBLE, my_rank+1, 0, MPI_COMM_WORLD, &request[2]);
MPI_Irecv(EE, Rows, MPI_DOUBLE, my_rank+1, MPI_ANY_TAG, MPI_COMM_WORLD, &request[4]);
MPI_Wait(&request[4], &status[4]);
MPI_Irecv(FF, Rows, MPI_DOUBLE, my_rank-2, MPI_ANY_TAG, MPI_COMM_WORLD, &request[5]);
MPI_Wait(&request[5], &status[5]);
MPI_Isend(Fmpi, Rows, MPI_DOUBLE, my_rank-2, 0, MPI_COMM_WORLD, &request[7]);
}
if ((my_rank) == 3)
{
MPI_Irecv(GG, Rows, MPI_DOUBLE, my_rank-1, MPI_ANY_TAG, MPI_COMM_WORLD, &request[2]);
MPI_Wait(&request[2], &status[2]);
MPI_Isend(G, Rows, MPI_DOUBLE, my_rank-1, 0, MPI_COMM_WORLD, &request[4]);
MPI_Irecv(HH, Rows, MPI_DOUBLE, my_rank-2, MPI_ANY_TAG, MPI_COMM_WORLD, &request[6]);
MPI_Wait(&request[6], &status[6]);
MPI_Isend(H, Rows, MPI_DOUBLE, my_rank-2, 0, MPI_COMM_WORLD, &request[8]);
}
}
}
}
Thanks!
You have a memory leak in your program; this:
leaks resources associated with the
MPI_Isendrequest. You call thisRows*Columnstimes per iteration, over presumably many iterations; but you’re only calling Wait for one of the requests. You presumably need to be doing anMPI_Waitall()for the two requests.But beyond that, your program is very confusing. No sensible MPI program should have such a series of
if (rank == ...)statements. And since you’re not doing any real work between the nonblocking send/recieves and the Waits, I don’t understand why you’re not just usingMPI_Sendrecvor something. What is your program trying to accomplish?UPDATE
Ok, so it looks like you’re doing standard halo-filling thing. A few things:
Each task does not need it’s own arrays – A/AA for rank 0, B/BB for rank 1, etc. The memory is distributed, not shared; no rank can see the others arrays, so there’s no need to worry about overwriting them. (If there was, you wouldn’t need to send messages). Besides, think how much harder this makes running on different numbers of processes – you’d have to add new arrays to the code each time you changed the number of processors you use.
You can read/write directly into the V array rather than using copies, although the copies may be easiest to understand initially.
I’ve written here a little version of a halo-filling code using your variable names (
Tmyo,Nmyo,V, indiciesiandy, etc). Each task has only it’s piece of the wider V array, and exchanges its edge data with only its neighbours. It uses characters so you can see what’s going on. It fills in its part of the V array with its rank #, and then exchanges its edge data with its neighbours.I’d STRONGLY encourage you to sit down with an MPI book and work through its examples. I’m fond of Using MPI, but there are many others. There are also a lot of good MPI tutorials out there. I think it’s no exaggeration to say that 95% of MPI books and tutorials (eg, ours here – see parts 5 and 6) will go through exactly this procedure as one of their first big worked examples. They will call it halo-filling or guardcell filling or boundry exchange or something, but it all comes down to passing edge data.
The above program can be simplified still further using
MPI_Cart_createto create your multidimensional domain and calculate your neighbours for you automatically, but I wanted to show you the logic so you see what’s going on.Also, if you can take some advice from someone who’s done this for a long time:
Any time you have line after line of repeated code: like 60 (!!) lines of this:
that’s a sign you aren’t using the right data structures. Here, you almost certainly want to have a 3d array of state variables, with (probably) the 3rd index being the species or local state variable or whatever you want to call i2, i1f, i1s, etc. Then all these lines can be replaced with a loop, and adding a new local state variable becomes much simpler.
Similarly, having essentially all your state being defined as global variables is going to make your life much tougher when it comes to updating and maintaing the code. Again, this is probably partly related to having things in zillions of independant state variables instead of having structures or higher-dimensional arrays grouping all relevant data together.