I am new to bash programming (grep/uniq/sort/etc…) and I am having trouble trying to remove duplicates from a file with the given format
--
name: joe
tag: 123
--
name: mike
tag: 000
--
name: dave
tag: 123
--
name: loopy
tag: 123
--
Basically what I want is to remove the duplicates in the file which have the same tag number, like this:
--
name: joe
tag: 123
--
name: mike
tag: 000
--
This task is a pretty good fit for awk. If you have gawk or mawk available, you can accomplish it by setting the record separator appropriately:
Output:
This works by remembering which tags have been seen (
h[$4]++), i.e. fourth element in each record. The bang (!) in front of the increment ensures that the condition is only true whenh[$4]is zero, so the default rule ({ print $0 }) is only invoked the first time tag is seen.A slightly shorter version:
Edit – handle records where name fields have spaces
The field count would vary if the name field has spaces. You can handle this by doing the field splitting a bit differently: