I have a file(tab delimited) with 6 columns (here I have shown 2 columns for simplicity)
46_#1 A
47_#1 B
49_#1 C
51_#1 D
51_#1 E
I want to count duplicates in first column (only count-no removal) and store count in next column. So output should be-
46_#1 1 A
47_#1 1 B
49_#1 1 C
51_#1 2 D
51_#1 2 E
I have used linux command-
uniq -c file
but this will take whole line (not 1st column) then I used
uniq -c -w5 file
But word count in first column can vary.
Can anyone help please?
PS- I have a very big file (around 1gb).
I don’t like just providing complete solutions, but it seemed the easiest way to explain. This program reads through the file twice: first to accumulate the frequency information and then to output the modified data.