I have a BASH script which performs many actions on a file, for e.g.:
cp input.txt file.tmp1
sed (code) file.tmp1 > file.tmp2
sed (code) file.tmp2 > file.tmp3
sed (code) file.tmp3 > file.tmp4
sed (code) file.tmp4 > file.tmp5
sed (code) file.tmp5 > file.tmp6
sed (code) file.tmp6 > file.tmp7
cp output.txt
In this way:
- The original file is unchanged.
- I can check the files changes at each stage, just to make sure my code did not do anything wrong.
However, this seems a not very ideal way to handle the files.
- Is there a better way to do this?
- Is there any tool which can help inspect the changes, just to see if anything unusual was introduced?
Working on a temporary file is a fine idea, but you should use
mktemp(1)to make your temporary file safely.While there’s nothing wrong with using multiple files for multiple passes, consider using
mktemp -dto create a temporary directory for all your files to ensure you never overwrite anything the user cares about.But if you’re never going to look at the intermediate files, multiple passes can be handled like this:
If one fails, they all fail, which can make for easier error handling. There’s no temporary files to remove when you’re finished.
If you like to inspect the pipelines for errors,
teewill help you. It redirects all input both to its standard output and a pipe, used like:You can inspect the changes by using
diff -u input.txt output.txt.diff(1)is a line-wise differences program, and the-uunified output is pretty easy to read.wdiff(1)is a word-wise differences program, which might be more useful for some cases.And
xxdiff(1)is a superb GUI interface for inspecting the differences between two files — it will go to some effort to show you individually changed characters. (It is also fantastic for handling CVS- and SVN-style conflict files, but that’s another matter completely.)