Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 718649
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 14, 20262026-05-14T05:33:04+00:00 2026-05-14T05:33:04+00:00

When attempting to run the first example in the boost::mpi tutorial , I was

  • 0

When attempting to run the first example in the boost::mpi tutorial, I was unable to run across more than two machines. Specifically, this seemed to run fine:

mpirun -hostfile hostnames -np 4 boost1

with each hostname in hostnames as <node_name> slots=2 max_slots=2. But, when I increase the number of processes to 5, it just hangs. I have decreased the number of slots/max_slots to 1 with the same result when I exceed 2 machines. On the nodes, this shows up in the job list:

<user> Ss orted --daemonize -mca ess env -mca orte_ess_jobid 388497408 \
-mca orte_ess_vpid 2 -mca orte_ess_num_procs 3 -hnp-uri \
388497408.0;tcp://<node_ip>:48823

Additionally, when I kill it, I get this message:

node2- daemon did not report back when launched
node3- daemon did not report back when launched

The cluster is set up with the mpi and boost libs accessible on an NFS mounted drive. Am I running into a deadlock with NFS? Or, is something else going on?

Update: To be clear, the boost program I am running is

#include <boost/mpi/environment.hpp>
#include <boost/mpi/communicator.hpp>
#include <iostream>
namespace mpi = boost::mpi;

int main(int argc, char* argv[]) 
{
  mpi::environment env(argc, argv);
  mpi::communicator world;
  std::cout << "I am process " << world.rank() << " of " << world.size()
        << "." << std::endl;
  return 0;
}

From @Dirk Eddelbuettel’s recommendations, I compiled and ran the mpi example hello_c.c, as follows

#include <stdio.h>
#include "mpi.h"

int main(int argc, char* argv[])
{
    int rank, size;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    printf("Hello, world, I am %d of %d\n", rank, size);
    MPI_Barrier(MPI_COMM_WORLD);
    MPI_Finalize();

   return 0;
}

It runs fine on a single machine with multiple processes, this includes sshing into any of the nodes and running. Each compute node is identical with the working and mpi/boost directories mounted from a remote machine via NFS. When running the boost program from the fileserver (identical to a node except boost/mpi are local), I am able to run on two remote nodes. For “hello world”, however, running the command mpirun -H node1,node2 -np 12 ./hello I get

[<node name>][[2771,1],<process #>] \
[btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] \
connect() to <node-ip> failed: No route to host (113)

while the all of the “Hello World’s” are printed and it hangs at the end. However, the behavior when running from a compute node on a remote node differs.

Both “Hello world” and the boost code just hang with mpirun -H node1 -np 12 ./hello when run from node2 and vice versa. (Hang in the same sense as above: orted is running on remote machine, but not communicating back.)

The fact that the behavior differs from running on the fileserver where the mpi libs are local versus on a compute node suggests that I may be running into an NFS deadlock. Is this a reasonable conclusion? Assuming that this is the case, how do I configure mpi to allow me to link it statically? Additionally, I don’t know what to make of the error I get when running from the fileserver, any thoughts?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-14T05:33:05+00:00Added an answer on May 14, 2026 at 5:33 am

    The answer turned out to be simple: open mpi authenticated via ssh and then opened up tcp/ip sockets between the nodes. The firewalls on the compute nodes were set up to only accept ssh connections from each other, not arbitrary connections. So, after updating iptables, hello world runs like a champ across all of the nodes.

    Edit: It should be pointed out that the fileserver’s firewall allowed arbitrary connections, so that was why an mpi program run on it would behave differently than just running on the compute nodes.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 510k
  • Answers 510k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team
    Editorial Team added an answer There's probably very little in the way of the List… May 16, 2026 at 5:03 pm
  • Editorial Team
    Editorial Team added an answer A modification to your first one would be more appropriate… May 16, 2026 at 5:03 pm
  • Editorial Team
    Editorial Team added an answer You could use the setInterval function to continuously poll the… May 16, 2026 at 5:03 pm

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Related Questions

I'm attempting to get my first hello world rails example going using the rails'
I am attempting to compile/run a sample WCF application from Juval Lowy's website (
First some background. I have a batch-type java process run from a DOS batch
I have sort of a tricky problem I'm attempting to solve. First of all,
I'm attempting to design an achievement system in Ruby on Rails and have run
I'm attempting to write a performance testing function that can take any function, run
Anyone else playing with ironruby? I have successfully got the IronRuby.Rails.Example project running on
I'm attempting to write a function in assembly that will detect if a longer
I’ve run into a very strange (to me) problem with the var keyword. I’ve
I have a SQL script that I am working on and I run into

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.