I’m writing a server Linux daemon. I want to know what protocol is in the UNIX/Linux community for what a daemon should do when it encounters a fatal error (eg. a server failing to listen, a segmentation fault, etc.). I have already done the whole thing with the system log, but I want to know what to do with a fatal error. Should I log and keep running in a infinite, do-nothing loop? Should I log and exit? What is the standard thing to do here and how do I do it?
The daemon is written in C++, and I’m using a custom exception system to wrap POSIX error codes, so I’ll know when things are fatal.
There are degrees of ‘fatal error’.
A server failing to listen is possibly a temporary issue; your daemon should probably continue trying to connect, maybe retrying periodically, and backing off slowly (1 second, 2 seconds, 4 seconds, etc).
If you catch a seg fault, maybe the best thing is to try to restart itself, by re-executing the daemon. It might recur, of course.
You shouldn’t go into an infinite do-nothing loop; you should terminate rather than do that. If your loop isn’t infinite but could be broken by a signal or something, maybe do-nothing is OK; I recommend the
pause()system call as a way to do nothing without consuming CPU time.You should certainly log what you’re doing and why before you exit.