After a fork call, i have one father that must send sigusr1 or sigusr2 (based on the value of the ‘cod’ variable) to his child. The child have to install the proper handlers before receiving sigusr1 or sigusr2. For doing so, i pause the father waiting for the child to signal him telling that he’s done with the handler installation. The father is signaled by sigusr1 and the handler for this signal is installed before the fork call. However, it seems the father can’t return from pause making me think that he actually never call the sigusr1 handler.
[...]
typedef enum{FALSE, TRUE} boolean;
boolean sigusr1setted = FALSE;
boolean sigusr2setted = FALSE;
void
sigusr1_handler0(int signo){
return;
}
void
sigusr1_handler(int signo){
sigusr1setted = TRUE;
}
void
sigusr2_handler(int signo){
sigusr2setted = TRUE;
}
int main(int argc, char *argv[]){
[...]
if(signal(SIGUSR1, sigusr1_handler0) == SIG_ERR){
perror("signal 0 error");
exit(EXIT_FAILURE);
}
pid = fork();
if (pid == 0){
if(signal(SIGUSR1, sigusr1_handler) == SIG_ERR){
perror("signal 1 error");
exit(EXIT_FAILURE);
}
if(signal(SIGUSR2, sigusr2_handler) == SIG_ERR){
perror("signal 2 error");
exit(EXIT_FAILURE);
}
kill(SIGUSR1, getppid()); // wake up parent by signaling him with sigusr1
// Wait for the parent to send the signals...
pause();
if(sigusr1setted){
if(execl("Prog1", "Prog1", (char*)0) < 0){
perror("exec P1 error");
exit(EXIT_FAILURE);
}
}
if(sigusr2setted){
if(execl("Prog2", "Prog2", (char*)0) < 0){
perror("exec P2 error");
exit(EXIT_FAILURE);
}
}
// Should'nt reach this point : something went wrong...
exit(EXIT_FAILURE);
}else if (pid > 0){
// The father must wake only after the child has done with the handlers installation
pause();
// Never reaches this point ...
if (cod == 1)
kill(SIGUSR1, pid);
else
kill(SIGUSR2, pid);
// Wait for the child to complete..
if(wait(NULL) == -1){
perror("wait 2 error");
exit(EXIT_FAILURE);
}
[...]
}else{
perror("fork 2 error");
exit(EXIT_FAILURE);
}
[...]
exit(EXIT_SUCCESS);
}
Assembling a plausible answer from the comments – so this is Community Wiki from the outset. (If Oli provides an answer, up-vote that instead of this!)
Oli Charlesworth gave what is probably the core of the problem:
pause().ouah noted accurately:
volatile sig_atomic_ttype otherwise the code is undefined.That said, POSIX allows a little more laxity than standard C does for what can be done inside a signal handler. We might also note the C99 provides
<stdbool.h>to define thebooltype.The original poster commented:
Suggestion: Use
usleep()(µ-sleep, or sleep in microseconds), ornanosleep()(sleep in nanoseconds)?Or use a different synchronization mechanism, such as:
open()calls return, both processes simply close the FIFO;Note that there is no data communication between the two processes via the FIFO; the code is simply relying on the kernel to block the processes until there is a reader and a writer, so both processes are ready to go.
Another possibility, is that the parent process could try
if (siguser1setted == FALSE) pause();to reduce the window for the race condition. However, it only reduces the window; it does not guarantee that the race condition cannot occur. That is, Murphy’s Law applies and the signal could arrive between the time the test is complete and the time thepause()is executed.All of this goes to show that signals are not a very good IPC mechanism. They can be used for IPC, but they should seldom actually be used for synchronization.
Incidentally, there’s no need to test the return value of any of the
exec*()family of functions. If the system call returns, it failed.And the questioner asked again:
Semaphores would certainly be another valid mechanism for synchronizing the two processes. Since I’d certainly have to look at the manual pages for semaphores whereas I can remember how to use FIFOs without looking, I’m not sure that I’d actually use them, but creating and removing a FIFO has its own set of issues so it is not clear that it is in any way ‘better’ (or ‘worse’); just different. It’s
mkfifo(),open(),close(),unlink()for FIFOs versussem_open()(orsem_init()),sem_post(),sem_wait(),sem_close(), and maybesem_unlink()(orsem_destroy()) for semaphores. You might want to think about registering a ‘FIFO removal’ or ‘semaphore cleanup’ function withatexit()to make sure the FIFO or semaphore is destroyed under as many circumstances as possible. However, that’s probably OTT for a test program.