I’ve created a question about this a few days. My solution is something in the lines of what was suggested in the accepted answer. However, a friend of mine came up with the following solution:
Please note that the code has been updated a few times (check the edit revisions) to reflect the suggestions in the answers below. If you intend to give a new answer, please do so with this new code in mind and not the old one which had lots of problems.
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
int main(int argc, char *argv[]){
int fd[2], i, aux, std0, std1;
do {
std0 = dup(0); // backup stdin
std1 = dup(1); // backup stdout
// let's pretend I'm reading commands here in a shell prompt
READ_COMMAND_FROM_PROMPT();
for(i=1; i<argc; i++) {
// do we have a previous command?
if(i > 1) {
dup2(aux, 0);
close(aux);
}
// do we have a next command?
if(i < argc-1) {
pipe(fd);
aux = fd[0];
dup2(fd[1], 1);
close(fd[1]);
}
// last command? restore stdout...
if(i == argc-1) {
dup2(std1, 1);
close(std1);
}
if(!fork()) {
// if not last command, close all pipe ends
// (the child doesn't use them)
if(i < argc-1) {
close(std0);
close(std1);
close(fd[0]);
}
execlp(argv[i], argv[i], NULL);
exit(0);
}
}
// restore stdin to be able to keep using the shell
dup2(std0, 0);
close(std0);
}
return 0;
}
This simulates a series of commands through pipes like in bash, for instance: cmd1 | cmd2 | … | cmd_n. I say “simulate”, because, as you can see, the commands are actually read from the arguments. Just to spare time coding a simple shell prompt…
Of course there are some issues to fix and to add like error handling but that’s not the point here. I think I kinda get the code but it still makes me a lot of confusing how this whole thing works.
Am I missing something or this really works and it’s a nice and clean solution to solve the problem? If not, can anyone point me the crucial problems this code has?
Looks reasonable, though it really needs to fix leaking
stdandauxto the children and after the loop, and the parent’s originalstdinis lost forever.This would probably be better with color…
./a.out foo bar baz <stdin >stdout std = dup(stdout) || |+==========================std || || || pipe(fd) || || pipe1[0] -- pipe0[1] || || || || || || aux = fd[0] || || aux || || || XX || || || || /-------++----------+| || dup2(fd[1], 1) || // || || || || || || || || close(fd[1]) || || || XX || || || || || fork+exec(foo) || || || || XX || || || /-----++-------+| || dup2(aux, 0) // || || || || || || || close(aux) || || XX || || || || pipe(fd) || || pipe2[0] -- pipe2[1] || || || || || || aux = fd[0] || || aux || || || XX || || || || /-------++----------+| || dup2(fd[1], 1) || // || || || || || || || || close(fd[1]) || || || XX || || || || || fork+exec(bar) || || || || XX || || || /-----++-------+| || dup2(aux, 0) // || || || || || || || close(aux) || || XX || || || || pipe(fd) || || pipe3[0] -- pipe3[1] || || || || || || aux = fd[0] || || aux || || || XX || || || || /-------++----------+| || dup2(fd[1], 1) || // || || || || || || || || close(fd[1]) || || || XX || || XX || || || /-------++-----------------+| dup2(std, 1) || // || || || || || || fork+exec(baz) || || || ||foogetsstdin=stdin,stdout=pipe1[1]bargetsstdin=pipe1[0],stdout=pipe2[1]bazgetsstdin=pipe2[0],stdout=stdoutMy suggestion is different in that it avoids mangling the parent’s
stdinandstdout, only manipulating them within the child, and never leaks any FDs. It’s a bit harder to diagram, though.parent cmds = [foo, bar, baz] fds = {0: stdin, 1: stdout} cmd = cmds[0] { there is a next cmd { pipe(new_fds) new_fds = {3, 4} fds = {0: stdin, 1: stdout, 3: pipe1[0], 4: pipe1[1]} } fork => child there is a next cmd { close(new_fds[0]) fds = {0: stdin, 1: stdout, 4: pipe1[1]} dup2(new_fds[1], 1) fds = {0: stdin, 1: pipe1[1], 4: pipe1[1]} close(new_fds[1]) fds = {0: stdin, 1: pipe1[1]} } exec(cmd) there is a next cmd { old_fds = new_fds old_fds = {3, 4} } } cmd = cmds[1] { there is a next cmd { pipe(new_fds) new_fds = {5, 6} fds = {0: stdin, 1: stdout, 3: pipe1[0], 4: pipe1[1], 5: pipe2[0], 6: pipe2[1]} } fork => child there is a previous cmd { dup2(old_fds[0], 0) fds = {0: pipe1[0], 1: stdout, 3: pipe1[0], 4: pipe1[1], 5: pipe2[0], 6: pipe2[1]} close(old_fds[0]) fds = {0: pipe1[0], 1: stdout, 4: pipe1[1], 5: pipe2[0] 6: pipe2[1]} close(old_fds[1]) fds = {0: pipe1[0], 1: stdout, 5: pipe2[0], 6: pipe2[1]} } there is a next cmd { close(new_fds[0]) fds = {0: pipe1[0], 1: stdout, 6: pipe2[1]} dup2(new_fds[1], 1) fds = {0: pipe1[0], 1: pipe2[1], 6: pipe2[1]} close(new_fds[1]) fds = {0: pipe1[0], 1: pipe1[1]} } exec(cmd) there is a previous cmd { close(old_fds[0]) fds = {0: stdin, 1: stdout, 4: pipe1[1], 5: pipe2[0], 6: pipe2[1]} close(old_fds[1]) fds = {0: stdin, 1: stdout, 5: pipe2[0], 6: pipe2[1]} } there is a next cmd { old_fds = new_fds old_fds = {3, 4} } } cmd = cmds[2] { fork => child there is a previous cmd { dup2(old_fds[0], 0) fds = {0: pipe2[0], 1: stdout, 5: pipe2[0], 6: pipe2[1]} close(old_fds[0]) fds = {0: pipe2[0], 1: stdout, 6: pipe2[1]} close(old_fds[1]) fds = {0: pipe2[0], 1: stdout} } exec(cmd) there is a previous cmd { close(old_fds[0]) fds = {0: stdin, 1: stdout, 6: pipe2[1]} close(old_fds[1]) fds = {0: stdin, 1: stdout} } }Edit
Your updated code does fix the previous FD leaks… but adds one: you’re now leaking
std0to the children. As Jon says, this is probably not dangerous to most programs… but you still should write a better behaved shell than this.Even if it’s temporary, I would strongly recommend against mangling your own shell’s standard in/out/err (0/1/2), only doing so within the child right before exec. Why? Suppose you add some
printfdebugging in the middle, or you need to bail out due to an error condition. You’ll be in trouble if you don’t clean up your messed-up standard file descriptors first. Please, for the sake of having things operate as expected even in unexpected scenarios, don’t muck with them until you need to.Edit
As I mentioned in other comments, splitting it up into smaller parts makes it much easier to understand. This small helper should be easily understandable and bug-free:
As should this:
You can see Bash‘s
execute_cmd.c#execute_disk_commandbeing called fromexecute_cmd.c#execute_pipeline, xsh‘sprocess.c#process_runbeing called fromjobs.c#job_run, and even every single one of BusyBox‘s various small and minimal shells splits them up.