Problem: Comparison of files from Pre-check status and Post-check status of a node for specific parameters.
With some help from community, I have written the following solution which extracts the information from files from directories pre and post and based on the “Node-ID” (which happens to be unique and is to be extracted from the files as well). After extracting the data from Pre/post folder, I have created folders based on the node-id and dumped files into the folders.
My Code to extract data (The data is extracted from Pre and Post folders)
FILES=$(find postcheck_logs -type f -name *.log)
for f in $FILES
do
NODE=`cat $f | grep -m 1 ">" | awk '{print $1}' | sed 's/[>]//g'` ##Generate the node-id
echo "Extracting Post check information for " $NODE
mkdir temp/$NODE-post ## create a temp directory
cat $f | awk 'BEGIN { RS=$NODE"> "; } /^param1/ { foo=RS $0; } END { print foo ; }' > temp/$NODE-post/param1.txt ## extract data
cat $f | awk 'BEGIN { RS=$NODE"> "; } /^param2/ { foo=RS $0; } END { print foo ; }' > temp/$NODE-post/param2.txt
cat $f | awk 'BEGIN { RS=$NODE"> "; } /^param3/ { foo=RS $0; } END { print foo ; }' > temp/$NODE-post/param3.txt
done
After this I have a structure as:
/Node1-pre/param1.txt
/Node1-post/param1.txt
and so on.
Now I am stuck to compare $NODE-pre and $NODE-post files,
I have tried to do it using recursive grep, but I am not finding a suitable way to do so. What is the best possible way to compare these files using diff?
Moreover, I find the above data extraction program very slow. I believe it’s not the best possible way (using least resources) to do so. Any suggestions?
Look askance at any instance of
cat one-file— you could use I/O redirection on the next command in the pipeline instead.You can do the whole thing more simply with:
The NODE finding process is much better done by a single
sedcommand thancat | grep | awk | sed, and you should plan to use$(...)rather than back-quotes everywhere.The main processing of the log file should be done once; a single
awkcommand is sufficient. The script is passed to variables — NODE and the directory name. The BEGIN is cleaned up; the$before NODE was probably not what you intended. The main actions are very similar; each looks for the relevant parameter name and saves it in an appropriate variable. At the end, it write the saved values to the relevant files, decorated with the value of RS. Semicolons are only needed when there’s more than one statement on a line; there’s just one statement per line in this expanded script. It looks bigger than the original, but that’s only because I’m using vertical space.As to comparing the before and after files, you can do it in many ways, depending on what you want to know. If you’ve got a POSIX-compliant
diff(you probably do), you can use:to report on the differences, if any, between the contents of the two directories. Alternatively, you can do it manually:
Clearly, you can wrap that in a ‘for each node’ loop. And, if you are going to need to do that, then you probably do want to capture the output of the
findcommand in a variable (as in the original code) so that you do not have to repeat that operation.