Consider the following scenario.
There are 2 Hazelcast nodes. One is stopped, another is running under quite heavy load.
Now, the second node comes up. The application starts up and its Hazelcast instance hooks up to the first. Hazelcast starts data repartitioning. For 2 nodes, it essentially means
that each entry in IMap gets copied to the new node and two nodes are assigned to be master/backup arbitrarily.
PROBLEM:
If the first node is brought down during this process, and the replication is not done completely, part of the IMap contents and ITopic subscriptions may be lost.
QUESTION:
How to ensure that the repartitioning process has finished, and it is safe to turn off the first node?
(The whole setup is made to enable software updates without downtime, while preserving current application state).
I tried using getPartitionService().addMigrationListener(…) but the listener does not seem to be hooked up to the complete migration process. Instead, I get tens to hundreds calls migrationStarted()/migrationCompleted() for each chunk of the replication.
1- When you gracefully shutdown first node, shutdown process should wait (block) until data is safely backed up.
2- If you use Hazelcast Management Center, it shows ongoing migration/repartitioning operation count in home screen.