Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7040693
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 28, 20262026-05-28T01:56:55+00:00 2026-05-28T01:56:55+00:00

I have a RabbitMQ cluster with two nodes in production and the cluster is

  • 0

I have a RabbitMQ cluster with two nodes in production and the cluster is breaking with these error messages:

=ERROR REPORT==== 23-Dec-2011::04:21:34 ===
** Node rabbit@rabbitmq02 not responding **
** Removing (timedout) connection **

=INFO REPORT==== 23-Dec-2011::04:21:35 ===
node rabbit@rabbitmq02 lost 'rabbit'

=ERROR REPORT==== 23-Dec-2011::04:21:49 ===
Mnesia(rabbit@rabbitmq01): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, rabbit@rabbitmq02}

I tried to simulate the problem by killing the connection between the two nodes using "tcpkill". The cluster has disconnected, and surprisingly the two nodes are not trying to reconnect!

When the cluster breaks, HAProxy load balancer still marks both nodes as active and send requests to both of them, although they are not in a cluster.

My questions:

  1. If the nodes are configured to work as a cluster, when I get a network failure, why aren’t they trying to reconnect afterwards?

  2. How can I identify broken cluster and shutdown one of the nodes? I have consistency problems when working with the two nodes separately.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-28T01:56:56+00:00Added an answer on May 28, 2026 at 1:56 am

    One other way to recover from this kind of failure is to work with Mnesia which is the database that RabbitMQ uses as the persistence mechanism and for the synchronization of the RabbitMQ instances (and the master / slave status) are controlled by this. For all the details, refer to the following URL: http://www.erlang.org/doc/apps/mnesia/Mnesia_chap7.html

    Adding the relevant section here:

    There are several occasions when Mnesia may detect that the network
    has been partitioned due to a communication failure.

    One is when Mnesia already is up and running and the Erlang nodes gain
    contact again. Then Mnesia will try to contact Mnesia on the other
    node to see if it also thinks that the network has been partitioned
    for a while. If Mnesia on both nodes has logged mnesia_down entries
    from each other, Mnesia generates a system event, called
    {inconsistent_database, running_partitioned_network, Node} which is
    sent to Mnesia’s event handler and other possible subscribers. The
    default event handler reports an error to the error logger.

    Another occasion when Mnesia may detect that the network has been
    partitioned due to a communication failure, is at start-up. If Mnesia
    detects that both the local node and another node received mnesia_down
    from each other it generates a {inconsistent_database,
    starting_partitioned_network, Node} system event and acts as described
    above.

    If the application detects that there has been a communication failure
    which may have caused an inconsistent database, it may use the
    function mnesia:set_master_nodes(Tab, Nodes) to pinpoint from which
    nodes each table may be loaded.

    At start-up Mnesia’s normal table load algorithm will be bypassed and
    the table will be loaded from one of the master nodes defined for the
    table, regardless of potential mnesia_down entries in the log. The
    Nodes may only contain nodes where the table has a replica and if it
    is empty, the master node recovery mechanism for the particular table
    will be reset and the normal load mechanism will be used when next
    restarting.

    The function mnesia:set_master_nodes(Nodes) sets master nodes for all
    tables. For each table it will determine its replica nodes and invoke
    mnesia:set_master_nodes(Tab, TabNodes) with those replica nodes that
    are included in the Nodes list (i.e. TabNodes is the intersection of
    Nodes and the replica nodes of the table). If the intersection is
    empty the master node recovery mechanism for the particular table will
    be reset and the normal load mechanism will be used at next restart.

    The functions mnesia:system_info(master_node_tables) and
    mnesia:table_info(Tab, master_nodes) may be used to obtain information
    about the potential master nodes.

    Determining which data to keep after communication failure is outside
    the scope of Mnesia. One approach would be to determine which “island”
    contains a majority of the nodes. Using the {majority,true} option for
    critical tables can be a way of ensuring that nodes that are not part
    of a “majority island” are not able to update those tables. Note that
    this constitutes a reduction in service on the minority nodes. This
    would be a tradeoff in favour of higher consistency guarantees.

    The function mnesia:force_load_table(Tab) may be used to force load
    the table regardless of which table load mechanism is activated.

    This is a more lengthy and involved way of recovering from such failures .. but will give better granularity and control over data that should be available in the final master node (this can reduce the amount of data loss that might happen when “merging” RabbitMQ masters).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a simple RabbitMQ test program randomly enqueuing messages, and another reading them,
I have a Windows Service that retrieves messages from a RabbitMQ queue. The service
I have a Topic exchange from which I'd like to distribute messages to two
I want to create a cluster of RabbitMQ on EC2. I have created an
Have you ever seen any of there error messages? -- SQL Server 2000 Could
We have configured an active/active cluster of RabbitMQ's in our test environment. We connect
When using RabbitMQ for sending messages you basically have exchanges, queues and bindings. I've
I have fresh installation of RabbitMQ on a linux/ubuntu server. Lets say the server
I have a pair of computers running arch linux with rabbitmq message queues and
Have deployed numerous report parts which reference the same view however one of them

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.