I am building a simple gen_server module which monitors activity of multiple remote nodes
When a remote node registers, this module monitors the node with erlang:monitor_node(Node, true). This is registered only once per node (confirmed with logs)
and in a handle_info/2 callback of gen_server, it catches {nodedown, Node} message and demonitors the node with erlang:monitor_node(Node, false). I expect to receive this message only once: when the remote node is down.
When I was testing the module, I found that when a remote node goes down, hundreds of {nodedown, Node} messages (the number varies from few hundreds to few thousands) are sent to the gen_server.
Why are multiple messages sent by monitor_node? How can I prevent this behaviour?
EDIT: here is (a part of) the source code
register_node(#node_info{node = NodeName} = NodeInfo) ->
case mnesia:read(node_info, NodeName) of
[] ->
monitor_node(NodeName, true),
error_logger:info_msg("node ~p registered", [NodeName]);
[_OldInfo] ->
error_logger:trace_msg("info of node ~p updated", [NodeName])
end,
mnesia:write(NodeInfo).
handle_cast({register_node, #node_info{} = NodeStatus}, Timer) ->
case mnesia:transaction(fun register_node/1, [NodeStatus]) of
{aborted, Reason} ->
error_logger:warning_msg("transaction register_node failed: ~p", [Reason]);
_ ->
ok
end,
{noreply, Timer};
handle_cast({shutdown_node, #node_info{} = NodeStatus}, Timer) ->
case mnesia:dirty_delete_object(NodeStatus) of
{aborted, Reason} ->
error_logger:warning_msg("transaction shutdown_node failed: ~p", [Reason]);
_ ->
ok
end,
{noreply, Timer};
handle_cast(Message, Timer) ->
error_logger:warning_msg("~p: received unknown message ~p", [?MODULE, Message]),
{noreply, Timer}.
handle_info({nodedown, Node}, Timer) ->
monitor_node(Node, false),
error_logger:info_msg("~p: node ~p down", [?MODULE, Node]),
mnesia:transaction(fun mnesia:delete/3, [node_info, Node, write]),
{noreply, Timer};
handle_info(Message, Timer) ->
error_logger:warning_msg("~p: received unknown message ~p", [?MODULE, Message]),
{noreply, Timer}.
You have done
monitor_node(NodeName, true)**INSIDE**the mnesia transaction.I think that because monitor_node will involve (I/O operation) message communication internally.
It is not suitable to put this line inside transation. It maybe send handreds of
'registered'message to the involved node. So that when the node became down, handreds of'nodedown'messages have been received.Please move the line out of
transactionor just use"CASE"expression, and try again.explanation of side-effect in mnesia transaction