If I want to remove lines where certain fields are duplicated then I use sort -u -k n,n.
But this keeps one occurrence. If I want to remove all occurrences of the duplicate is there any quick bash or awk way to do this?
Eg I have:
1 apple 30
2 banana 21
3 apple 9
4 mango 2
I want:
2 banana 21
4 mango 2
I will presort and then use a hash in perl but for v. large files this is going to be slow.
Try
sort -k <your fields> | awk '{print $3, $1, $2}' | uniq -f2 -u | awk '{print $2, $3, $1}'to remove all lines that are duplicated (without keeping any copies). If you don’t need the last field, change that firstawkcommand to justcut -f 1-5 -d ' ', change the-f2inuniqto-f1, and remove the secondawkcommand.