I have an awk command that outputs entries absent from $NEWFILE but found in $OLDFILE:
awk -F "|" 'NR==FNR{a[$4]++}!a[$4]' $NEWFILE $OLDFILE > $OUTFILE
This command works great when all entries for an entity sharing a unique identifier are not found in $NEWFILE. However, it fails when only one entry for the entity, but not all, has been removed from $NEWFILE.
Anyone have a suggestion about how I can tweak this awk command to output all the entries absent from $NEWFILE but found in $OLDFILE, regardless of whether all the entries for an entity are removed?
If I understand you correctly, this is what you want
Since
NEWFILEdon’t have the urls present inOLDFILEthe unique row identifier is the composite of the four first fields. BecauseNEWFILEdoesn’t have those urls a simplediffwon’t do.