I have a file which looks like this (3 columns and n number of rows)
chr8 101999980 102031975
chr8 101999980 102033533
chr8 101999980 102033533
chr8 101999980 102032736
chr8 101999980 102034799
chr8 101999980 102034799
chr8 101999980 102034397
chr8 101999980 102032736
and from this data I want to remove the redundant lines and these exact repeated data could be present anywhere in this dataset with a bash script.
If order does not matter.
It works on adjacent identical rows, that’s why you need sort. In your file, you don’t need sort because the duplicates come right next to each other. If that is not the standard case, you need to sort the file first.
With and without sort both return 6 lines, but you did not say it is the default.