If a mongo node is offline for too long and the oplog wraps before it comes back up then it can get stuck in a stale state and require manual intervention. How can I recognise that state from the replica set status document? Will it stick in state 3, which is also used by nodes in maintenance mode and presumably by nodes that can catch up? If so, how can I tell the difference?
From http://docs.mongodb.org/manual/reference/replica-status/:
Number State
0 Starting up, phase 1 (parsing configuration)
1 Primary
2 Secondary
3 Recovering (initial syncing, post-rollback, stale members)
4 Fatal error
5 Starting up, phase 2 (forking threads)
6 Unknown state (the set has never connected to the member)
7 Arbiter
8 Down
9 Rollback
10 Removed
It will be in state 3, Recovering. To recognize the stale state specifically you need to look for the
errmsgfield. When stale, the secondary in question will have an errmsg like this:In terms of a full output, it would look something like this:
And finally, a code snippet to print out the error only, if it exists, from the shell: