I have a problem in topic of posix processes and I can’t get around.
I have a process which forks several children (the process tree can be complex, not only one level). It also keeps track of the active childrens’ PID. At some point the parent receives a signal (SIGINT, let’s say).
In the signal handler for SIGINT, it iterates over the list of child processes and sends the same signal to them in order to prevent zombies. Now, the problem is that
- if the parent does not waitpid() for the child to be stopped, the signal seems to be never dispatched (zombies keep running)
- if the parent waits after every kill() sent to a child, it simply hangs there and the child seems to ignore the signal
Parent and children have the same signal handler, as it’s installed before forking.
Here is a pseudocode.
signal_handler( signal )
foreach child in children
kill( child, signal )
waitpid( child, status )
// Releasing system resources, etc.
clean_up()
// Restore signal handlers.
set_signal_handlers_to_default()
// Send back the expected "I exited after XY signal" to the parent by
// executing the default signal handler again.
kill( getpid(), signal )
With this implementation the execution stops on the waitpid. If I remove the waitpid, the children keep running.
My guess is that unless a signal handler has ended, the signals sent from it are not dispatched to the children. But why aren’t they dispatched if I omit wait?
Thanks a lot in advance!
What you describe should work and indeed it does, with the following testcase:
If you see the parent hanging in
waitpid, it means the child has not exited. Try to attach with a debugger to see where the child is blocked, or, easier, run the program withstrace(1). How do you clean up your pid array? Make sure the children are not trying callwaitpidwith pid parameter being <= 0. Make sure the children are not blocking or ignoring the signal.