I have a program where i ssh into a server and gets data. Here is the code… I fork it and the child executes the query and the parent waits for the child for a predetermined amount of time (in function timeout) and then kills the child. I did that part because sometimes, i am not exactly sure why, but the ssh connection stops and doesnot exit. That is there is a “ssh -oConnectTimeout=60 blah blah” in the processes list for a long and the timeout function doesnt seem to work. What am i doing wrong here? The last time this problem occured, there was an ssh in process list for 5 days and still it didnot timeout and the program had stopped because it was waiting for the child. There are those extra wait() functions because previously i was getting a lot of defunct processes a.k.a zombies. So i took the easy way out..
c = fork();
if(c==0) {
close(fd[READ]);
if (dup2(fd[WRITE],STDOUT_FILENO) != -1)
execlp("ssh", "ssh -oConnectTimeout=60", serverDetails.c_str(), NULL);
_exit(1);
}else{
if(timeout(c) == 1){
kill(c,SIGTERM);
waitpid(c, &exitStatus, WNOHANG);
wait(&exitStatus);
return 0;
}
wait(&exitStatus);
}
This is the timeout function.
int timeout(int childPID)
{
int times = 0, max_times = 10, status, rc;
while (times < max_times){
sleep(5);
rc = waitpid(childPID, &status, WNOHANG);
if(rc < 0){
perror("waitpid");
exit(1);
}
if(WIFEXITED(status) || WIFSIGNALED(status)){
/* child exits */
break;
}
times++;
}
if (times >= max_times){
return 1;
}
else return 0;
}
SIGTERM just asks for a polite termination of the process. If it’s got stuck, then it won’t respond to that, and you’ll need to use SIGKILL to kill it. Probably after trying SIGTERM and waiting a little while.
The other possibility is that it’s waiting for the output pipe to the parent process to not be full – maybe there’s enough output to fill the buffer, and the child is waiting on that rather than the network.